Skip to main content
For the complete documentation index for agents and LLMs, see llms.txt.

DocumentToImageContent

Extract visual content from images or PDFs and convert them into ImageContent objects you can use for multimodal AI tasks.

Key Features

  • Converts documents with image or PDF file paths in their metadata into ImageContent objects
  • Supports both image files and PDF pages as input sources
  • Extracts specific PDF pages using the page_number metadata key
  • Optionally resizes images while maintaining aspect ratio to reduce file size and processing time
  • Configurable detail level for optimization with different AI models (such as OpenAI)
  • Returns None for documents that couldn't be processed instead of failing the pipeline

Configuration

  1. Drag the DocumentToImageContent component onto the canvas from the Component Library.
  2. Click the component to open the configuration panel.
  3. Configure the parameters as needed.

Connections

DocumentToImageContent receives a list of Document objects as input — typically from a Retriever that returns documents with file paths stored in their metadata. It outputs a list of ImageContent objects that you can send to ChatPromptBuilder to include images in the prompt for a vision-enabled model.

Usage Example

Pipeline Example

Here's an example of DocumentToImageContent used in a query pipeline. It extracts images from documents and sends them to a ChatPromptBuilder that includes them in the chat message for the model. Note that the model must support multimodal input.

components:
document_to_image:
type: haystack.components.converters.image.document_to_image.DocumentToImageContent
init_parameters:
file_path_meta_field: file_path
root_path: "/data/images"
detail: high
size: [512, 512]

prompt_builder:
type: haystack.components.builders.chat_prompt_builder.ChatPromptBuilder
init_parameters:
template: "- _role: user\n _content:\n - text: 'Analyze the following images and answer this question: {{question}}'\n 'image: {{images}}'\n"
generator:
type: haystack.components.generators.openai.OpenAIGenerator
init_parameters:
model: gpt-4-vision-preview

OpenSearchEmbeddingRetriever:
type: haystack_integrations.components.retrievers.opensearch.embedding_retriever.OpenSearchEmbeddingRetriever
init_parameters:
filters:
top_k: 10
filter_policy: replace
custom_query:
raise_on_failure: true
efficient_filtering: true
document_store:
type: haystack_integrations.document_stores.opensearch.document_store.OpenSearchDocumentStore
init_parameters:
hosts:
index: Standard-Index-English
max_chunk_bytes: 104857600
embedding_dim: 768
return_embedding: false
method:
mappings:
settings:
create_index: true
http_auth:
use_ssl:
verify_certs:
timeout:
DeepsetNvidiaTextEmbedder:
type: deepset_cloud_custom_nodes.embedders.nvidia.text_embedder.DeepsetNvidiaTextEmbedder
init_parameters:
model: intfloat/multilingual-e5-base
prefix: ''
suffix: ''
truncate:
normalize_embeddings: true
timeout:
backend_kwargs:
DeepsetAnswerBuilder:
type: deepset_cloud_custom_nodes.augmenters.deepset_answer_builder.DeepsetAnswerBuilder
init_parameters:
pattern:
reference_pattern:
extract_xml_tags:

connections:
- sender: document_to_image.image_contents
receiver: prompt_builder.images
- sender: prompt_builder.prompt
receiver: generator.prompt
- sender: OpenSearchEmbeddingRetriever.documents
receiver: document_to_image.documents

- sender: DeepsetNvidiaTextEmbedder.embedding
receiver: OpenSearchEmbeddingRetriever.query_embedding
- sender: generator.replies
receiver: DeepsetAnswerBuilder.replies

inputs:
query:
- prompt_builder.question
- DeepsetNvidiaTextEmbedder.text
- DeepsetAnswerBuilder.query

outputs:
answers: DeepsetAnswerBuilder.answers

max_runs_per_component: 100

metadata: {}

Parameters

Inputs

ParameterTypeDefaultDescription
documentsList[Document]List of documents to extract images from with metadata containing file paths to image or PDF files.

Outputs

ParameterTypeDefaultDescription
image_contentsList[Optional[ImageContent]]A list of ImageContent objects extracted from the documents, or None for documents that couldn't be processed.

Init Parameters

These are the parameters you can configure in Pipeline Builder:

ParameterTypeDefaultDescription
file_path_meta_fieldstrfile_pathThe metadata field in the Document that contains the file path to the image or PDF.
root_pathOptional[str]NoneThe root directory path where document files are located. If provided, file paths in document metadata will be resolved relative to this path. If None, file paths are treated as absolute paths.
detailOptional[Literal]NoneOptional detail level of the image (only supported by OpenAI). Can be "auto", "high", or "low". This will be passed to the created ImageContent objects.
sizeOptional[Tuple[int, int]]NoneIf provided, resizes the image to fit within the specified dimensions (width, height) while maintaining aspect ratio. This reduces file size, memory usage, and processing time.

Run Method Parameters

These are the parameters you can configure for the component's run() method. This means you can pass these parameters at query time through the API, in Playground, or when running a job. For details, see Modify Pipeline Parameters at Query Time.

ParameterTypeDefaultDescription
documentsList[Document]List of documents with metadata containing file paths to image or PDF files.