Skip to main content
For the complete documentation index for agents and LLMs, see llms.txt.

DocumentToImageContent

Extract visual content from images or PDFs and convert them into ImageContent objects you can use for multimodal AI tasks.

DocumentToImageContent processes documents whose metadata contains file paths pointing to image or PDF files. For images, it encodes the file directly. For PDFs, it extracts the page specified by the page_number metadata key and converts it to an image.

Key Features

  • Extracts and base64-encodes images from image files or specific PDF pages.
  • Accepts documents with file path metadata — no raw file input needed.
  • Optional image resizing to reduce file size while preserving aspect ratio.
  • Configurable detail level for optimization with OpenAI vision models.
  • Returns None for documents that cannot be processed, without failing the pipeline.

Configuration

  1. Drag the DocumentToImageContent component onto the canvas from the Component Library.
  2. Click on the component to open the configuration panel.
  3. Configure the component settings:
    • Set the File Path Meta Field to specify which metadata key in each Document contains the file path. The default is file_path.
    • Optionally, set the Root Path to a base directory. File paths in document metadata are resolved relative to this path.
    • Set the Detail level for images (auto, high, or low). This is passed to the created ImageContent objects and is only supported by OpenAI.
    • Set the Size to resize images to the specified dimensions (width, height) while maintaining aspect ratio.

Connections

DocumentToImageContent accepts a list of Document objects through its documents input. It outputs a list of ImageContent objects (or None for documents that couldn't be processed).

It typically connects with:

  • Retrievers: receives documents from a retriever that found relevant image or PDF references.
  • ChatPromptBuilder: sends extracted ImageContent objects to include in multimodal prompts.

Source Code

To check this component's source code, open document_to_image.py in the Haystack repository.

Usage Examples

Basic Configuration

  document_to_image:
type: haystack.components.converters.image.document_to_image.DocumentToImageContent
init_parameters:
file_path_meta_field: file_path
root_path: /data/images
detail: high
size:
- 512
- 512

Pipeline Example

Here's an example of DocumentToImageContent used in a query pipeline. It extracts images from documents and sends them to a ChatPromptBuilder that includes them in the chat message for the model. Note that the model must support multimodal input.

# haystack-pipeline
components:
document_to_image:
type: haystack.components.converters.image.document_to_image.DocumentToImageContent
init_parameters:
file_path_meta_field: file_path
root_path: "/data/images"
detail: high
size: [512, 512]

prompt_builder:
type: haystack.components.builders.chat_prompt_builder.ChatPromptBuilder
init_parameters:
template: "- _role: user\n _content:\n - text: 'Analyze the following images and answer this question: {{question}}'\n 'image: {{images}}'\n"
generator:
type: haystack.components.generators.openai.OpenAIGenerator
init_parameters:
model: gpt-4-vision-preview

OpenSearchEmbeddingRetriever:
type: haystack_integrations.components.retrievers.opensearch.embedding_retriever.OpenSearchEmbeddingRetriever
init_parameters:
filters:
top_k: 10
filter_policy: replace
custom_query:
raise_on_failure: true
efficient_filtering: true
document_store:
type: haystack_integrations.document_stores.opensearch.document_store.OpenSearchDocumentStore
init_parameters:
hosts:
index: Standard-Index-English
max_chunk_bytes: 104857600
embedding_dim: 768
return_embedding: false
method:
mappings:
settings:
create_index: true
http_auth:
use_ssl:
verify_certs:
timeout:
DeepsetNvidiaTextEmbedder:
type: deepset_cloud_custom_nodes.embedders.nvidia.text_embedder.DeepsetNvidiaTextEmbedder
init_parameters:
model: intfloat/multilingual-e5-base
prefix: ''
suffix: ''
truncate:
normalize_embeddings: true
timeout:
backend_kwargs:
DeepsetAnswerBuilder:
type: deepset_cloud_custom_nodes.augmenters.deepset_answer_builder.DeepsetAnswerBuilder
init_parameters:
pattern:
reference_pattern:
extract_xml_tags:

connections:
- sender: document_to_image.image_contents
receiver: prompt_builder.images
- sender: prompt_builder.prompt
receiver: generator.prompt
- sender: OpenSearchEmbeddingRetriever.documents
receiver: document_to_image.documents

- sender: DeepsetNvidiaTextEmbedder.embedding
receiver: OpenSearchEmbeddingRetriever.query_embedding
- sender: generator.replies
receiver: DeepsetAnswerBuilder.replies

inputs:
query:
- prompt_builder.question
- DeepsetNvidiaTextEmbedder.text
- DeepsetAnswerBuilder.query

outputs:
answers: DeepsetAnswerBuilder.answers

max_runs_per_component: 100

metadata: {}

Parameters

Inputs

ParameterTypeDescription
documentsList[Document]List of documents to extract images from with metadata containing file paths to image or PDF files.

Outputs

ParameterTypeDescription
image_contentsList[Optional[ImageContent]]A list of ImageContent objects extracted from the documents, or None for documents that couldn't be processed.

Init Parameters

These are the parameters you can configure in Pipeline Builder:

ParameterTypeDefaultDescription
file_path_meta_fieldstrfile_pathThe metadata field in the Document that contains the file path to the image or PDF.
root_pathOptional[str]NoneThe root directory path where document files are located. If provided, file paths in document metadata will be resolved relative to this path. If None, file paths are treated as absolute paths.
detailOptional[Literal]NoneOptional detail level of the image (only supported by OpenAI). Can be "auto", "high", or "low". This will be passed to the created ImageContent objects.
sizeOptional[Tuple[int, int]]NoneIf provided, resizes the image to fit within the specified dimensions (width, height) while maintaining aspect ratio. This reduces file size, memory usage, and processing time.

Run Method Parameters

These are the parameters you can configure for the component's run() method. This means you can pass these parameters at query time through the API, in Playground, or when running a job. For details, see Modify Pipeline Parameters at Query Time.

ParameterTypeDescription
documentsList[Document]List of documents with metadata containing file paths to image or PDF files.