PDFToImageContent
Convert PDF files to ImageContent objects for multimodal AI processing. Each converted page becomes a separate ImageContent object containing base64-encoded image data and associated metadata.
PDFToImageContent reads PDF files and converts specified pages into images, creating ImageContent objects containing base64-encoded image data and metadata. Each converted page becomes a separate ImageContent object with metadata indicating the source file and page number.
Key Features
- Converts PDF pages to base64-encoded
ImageContentobjects for multimodal AI models. - Supports specific page selection and page ranges.
- Optional image resizing to reduce file size while maintaining aspect ratio.
- Configurable detail level for optimization with OpenAI vision models.
Configuration
- Drag the
PDFToImageContentcomponent onto the canvas from the Component Library. - Click on the component to open the configuration panel.
- Configure the component settings:
- Set the Page Range to specify which pages to convert. Accepts page numbers and ranges, for example
['1-3', '5', '8']. If not set, all pages are converted. - Set the Detail level for images (
auto,high, orlow). This is passed to the createdImageContentobjects and is only supported by OpenAI. - Set the Size to resize images to the specified dimensions (width, height) while maintaining aspect ratio. This reduces file size, memory usage, and processing time.
- Set the Page Range to specify which pages to convert. Accepts page numbers and ranges, for example
Connections
PDFToImageContent accepts a list of file paths or ByteStream objects through its sources input. It outputs a list of ImageContent objects, one per converted page.
It typically connects with:
FilesInputorFileTypeRouter: receives PDF files.ChatPromptBuilder: sends extracted page images to include in multimodal prompts.
Source Code
To check this component's source code, open pdf_to_image.py in the Haystack repository.
Usage Examples
Basic Configuration
PDFToImageContent:
type: haystack.components.converters.image.PDFToImageContent
init_parameters:
detail: auto
page_range:
- 1-3
Using the Component in a Pipeline
This example shows a query pipeline that uses PDFToImageContent to convert uploaded PDF pages to images for multimodal AI processing. The pipeline extracts specified pages from PDF documents, converts them to ImageContent objects, and sends them to a vision-enabled chat model for analysis.
# haystack-pipeline
components:
PDFToImageContent:
type: haystack.components.converters.image.PDFToImageContent
init_parameters:
detail: auto
size:
page_range:
- "1-3"
ChatPromptBuilder:
type: haystack.components.builders.chat_prompt_builder.ChatPromptBuilder
init_parameters:
template:
- role: system
content: >-
You are an AI assistant that analyzes PDF documents. Examine the
provided PDF pages and answer questions about their content,
layout, text, images, and any other visible elements.
- role: user
content: "{{ question }}"
images: "{{ images }}"
required_variables:
- question
- images
OpenAIChatGenerator:
type: haystack.components.generators.chat.openai.OpenAIChatGenerator
init_parameters:
api_key:
type: env_var
env_vars:
- OPENAI_API_KEY
strict: true
model: gpt-4o
generation_kwargs:
OutputAdapter:
type: haystack.components.converters.output_adapter.OutputAdapter
init_parameters:
template: "{{ replies[0].text }}"
output_type: str
DeepsetAnswerBuilder:
type: deepset_cloud_custom_nodes.augmenters.deepset_answer_builder.DeepsetAnswerBuilder
init_parameters:
reference_pattern:
connections:
- sender: PDFToImageContent.image_contents
receiver: ChatPromptBuilder.images
- sender: ChatPromptBuilder.prompt
receiver: OpenAIChatGenerator.messages
- sender: OpenAIChatGenerator.replies
receiver: OutputAdapter.replies
- sender: OutputAdapter.output
receiver: DeepsetAnswerBuilder.replies
inputs:
query:
- ChatPromptBuilder.question
files:
- PDFToImageContent.sources
outputs:
answers: DeepsetAnswerBuilder.answers
Parameters
Inputs
| Parameter | Type | Default | Description |
|---|---|---|---|
sources | List[Union[str, Path, ByteStream]] | List of PDF file paths or ByteStream objects to convert. | |
meta | Optional[Union[Dict[str, Any], List[Dict[str, Any]]]] | None | Optional metadata to attach to the ImageContent objects. This value can be a list of dictionaries or a single dictionary. If it's a single dictionary, its content is added to the metadata of all produced ImageContent objects. If it's a list, its length must match the number of sources as they're zipped together. For ByteStream objects, their meta is added to the output ImageContent objects. |
detail | Optional[Literal["auto", "high", "low"]] | None | Optional detail level of the image (only supported by OpenAI). This is passed to the created ImageContent objects. If not provided, the detail level is the one set in the constructor. |
size | Optional[Tuple[int, int]] | None | If provided, resizes the image to fit within the specified dimensions (width, height) while maintaining aspect ratio. If not provided, the size value is the one set in the constructor. |
page_range | Optional[List[Union[str, int]]] | None | List of page numbers and page ranges to convert to images. If not provided, the page range is the one set in the pipeline configuration. |
Outputs
| Parameter | Type | Description |
|---|---|---|
image_contents | List[ImageContent] | A list of ImageContent objects created from the PDF pages. |
Init Parameters
These are the parameters you can configure in Pipeline Builder:
| Parameter | Type | Default | Description |
|---|---|---|---|
detail | Optional[Literal["auto", "high", "low"]] | None | Optional detail level of the image (only supported by OpenAI). Possible values: "auto", "high", or "low". This is passed to the created ImageContent objects. |
size | Optional[Tuple[int, int]] | None | If provided, resizes the image to fit within the specified dimensions (width, height) while maintaining aspect ratio. This reduces file size, memory usage, and processing time. |
page_range | Optional[List[Union[str, int]]] | None | List of page numbers and page ranges to convert to images. Page numbers start at 1. If None, all pages in the PDF are converted. Also accepts printable range strings, for example: ['1-3', '5', '8', '10-12']. |
Run Method Parameters
These are the parameters you can configure for the component's run() method. This means you can pass these parameters at query time through the API, in Playground, or when running a job. For details, see Modify Pipeline Parameters at Query Time.
| Parameter | Type | Default | Description |
|---|---|---|---|
sources | List[Union[str, Path, ByteStream]] | List of PDF file paths or ByteStream objects to convert. | |
meta | Optional[Union[Dict[str, Any], List[Dict[str, Any]]]] | None | Optional metadata to attach to the ImageContent objects. This value can be a list of dictionaries or a single dictionary. If it's a single dictionary, its content is added to the metadata of all produced ImageContent objects. If it's a list, its length must match the number of sources as they're zipped together. For ByteStream objects, their meta is added to the output ImageContent objects. |
detail | Optional[Literal["auto", "high", "low"]] | None | Optional detail level of the image (only supported by OpenAI). This is passed to the created ImageContent objects. If not provided, the detail level is the one set in the constructor. |
size | Optional[Tuple[int, int]] | None | If provided, resizes the image to fit within the specified dimensions (width, height) while maintaining aspect ratio. If not provided, the size value is the one set in the constructor. |
page_range | Optional[List[Union[str, int]]] | None | List of page numbers and page ranges to convert to images. Page numbers start at 1. If None, all pages in the PDF are converted. Pages outside the valid range are skipped with a warning. Also accepts printable range strings, for example: ['1-3', '5', '8', '10-12']. |
Was this page helpful?