DocumentToImageContent
Extract visual content from images or PDFs and convert them into ImageContent objects you can use for multimodal AI tasks.
Key Features
- Converts documents with image or PDF file paths in their metadata into
ImageContentobjects - Supports both image files and PDF pages as input sources
- Extracts specific PDF pages using the
page_numbermetadata key - Optionally resizes images while maintaining aspect ratio to reduce file size and processing time
- Configurable detail level for optimization with different AI models (such as OpenAI)
- Returns
Nonefor documents that couldn't be processed instead of failing the pipeline
Configuration
- Drag the
DocumentToImageContentcomponent onto the canvas from the Component Library. - Click the component to open the configuration panel.
- Configure the parameters as needed.
Connections
DocumentToImageContent receives a list of Document objects as input — typically from a Retriever that returns documents with file paths stored in their metadata. It outputs a list of ImageContent objects that you can send to ChatPromptBuilder to include images in the prompt for a vision-enabled model.
Usage Example
Pipeline Example
Here's an example of DocumentToImageContent used in a query pipeline. It extracts images from documents and sends them to a ChatPromptBuilder that includes them in the chat message for the model. Note that the model must support multimodal input.
components:
document_to_image:
type: haystack.components.converters.image.document_to_image.DocumentToImageContent
init_parameters:
file_path_meta_field: file_path
root_path: "/data/images"
detail: high
size: [512, 512]
prompt_builder:
type: haystack.components.builders.chat_prompt_builder.ChatPromptBuilder
init_parameters:
template: "- _role: user\n _content:\n - text: 'Analyze the following images and answer this question: {{question}}'\n 'image: {{images}}'\n"
generator:
type: haystack.components.generators.openai.OpenAIGenerator
init_parameters:
model: gpt-4-vision-preview
OpenSearchEmbeddingRetriever:
type: haystack_integrations.components.retrievers.opensearch.embedding_retriever.OpenSearchEmbeddingRetriever
init_parameters:
filters:
top_k: 10
filter_policy: replace
custom_query:
raise_on_failure: true
efficient_filtering: true
document_store:
type: haystack_integrations.document_stores.opensearch.document_store.OpenSearchDocumentStore
init_parameters:
hosts:
index: Standard-Index-English
max_chunk_bytes: 104857600
embedding_dim: 768
return_embedding: false
method:
mappings:
settings:
create_index: true
http_auth:
use_ssl:
verify_certs:
timeout:
DeepsetNvidiaTextEmbedder:
type: deepset_cloud_custom_nodes.embedders.nvidia.text_embedder.DeepsetNvidiaTextEmbedder
init_parameters:
model: intfloat/multilingual-e5-base
prefix: ''
suffix: ''
truncate:
normalize_embeddings: true
timeout:
backend_kwargs:
DeepsetAnswerBuilder:
type: deepset_cloud_custom_nodes.augmenters.deepset_answer_builder.DeepsetAnswerBuilder
init_parameters:
pattern:
reference_pattern:
extract_xml_tags:
connections:
- sender: document_to_image.image_contents
receiver: prompt_builder.images
- sender: prompt_builder.prompt
receiver: generator.prompt
- sender: OpenSearchEmbeddingRetriever.documents
receiver: document_to_image.documents
- sender: DeepsetNvidiaTextEmbedder.embedding
receiver: OpenSearchEmbeddingRetriever.query_embedding
- sender: generator.replies
receiver: DeepsetAnswerBuilder.replies
inputs:
query:
- prompt_builder.question
- DeepsetNvidiaTextEmbedder.text
- DeepsetAnswerBuilder.query
outputs:
answers: DeepsetAnswerBuilder.answers
max_runs_per_component: 100
metadata: {}
Parameters
Inputs
| Parameter | Type | Default | Description |
|---|---|---|---|
| documents | List[Document] | List of documents to extract images from with metadata containing file paths to image or PDF files. |
Outputs
| Parameter | Type | Default | Description |
|---|---|---|---|
| image_contents | List[Optional[ImageContent]] | A list of ImageContent objects extracted from the documents, or None for documents that couldn't be processed. |
Init Parameters
These are the parameters you can configure in Pipeline Builder:
| Parameter | Type | Default | Description |
|---|---|---|---|
| file_path_meta_field | str | file_path | The metadata field in the Document that contains the file path to the image or PDF. |
| root_path | Optional[str] | None | The root directory path where document files are located. If provided, file paths in document metadata will be resolved relative to this path. If None, file paths are treated as absolute paths. |
| detail | Optional[Literal] | None | Optional detail level of the image (only supported by OpenAI). Can be "auto", "high", or "low". This will be passed to the created ImageContent objects. |
| size | Optional[Tuple[int, int]] | None | If provided, resizes the image to fit within the specified dimensions (width, height) while maintaining aspect ratio. This reduces file size, memory usage, and processing time. |
Run Method Parameters
These are the parameters you can configure for the component's run() method. This means you can pass these parameters at query time through the API, in Playground, or when running a job. For details, see Modify Pipeline Parameters at Query Time.
| Parameter | Type | Default | Description |
|---|---|---|---|
| documents | List[Document] | List of documents with metadata containing file paths to image or PDF files. |
Was this page helpful?