DocumentToImageContent
Extract visual content from images or PDFs and convert them into ImageContent objects you can use for multimodal AI tasks.
Basic Information
- Type:
haystack.components.converters.image.DocumentToImageContent - Components it can connect to:
- Retrivers:
DocumentToImageContentcan receive documents from a Retriever. ChatPromptBuilder:DocumentToImageContentsends the extracted images toChatPromptBuilderthat includes them in the instructions for the model.- Any component that outputs documents or accepts
ImageContentas input.
- Retrivers:
Inputs
| Parameter | Type | Default | Description |
|---|---|---|---|
| documents | List[Document] | List of documents to extract images from with metadata containing file paths to image or PDF files. |
Outputs
| Parameter | Type | Default | Description |
|---|---|---|---|
| image_contents | List[Optional[ImageContent]] | A list of ImageContent objects extracted from the documents, or None for documents that couldn't be processed. |
Overview
DocumentToImageContent processes a list of documents with file paths in their metadata pointing to images or PDFs. It extracts visual content from supported file formats.
Documents must have metadata containing:
- The file path key with a valid file path that exists when combined with the root path.
- A supported image format (MIME type must be one of the supported image types).
- For PDF files, a
page_numberkey specifying which page to extract.
When given an image, it extracts and encodes the file directly. When given a PDF, it extracts the specified page using the page_number metadata key and converts it into an image. It can optionally resize images and set detail levels for optimization with different AI models. You can specify the size in the size parameter. DocumentToImageContent resizes the image while keeping the aspect ratio.
Usage Example
Initializing the Component
components:
DocumentToImageContent:
type: haystack.components.converters.image.DocumentToImageContent
init_parameters:
file_path_meta_field: file_path
detail: high
size: [800, 600]
Pipeline Example
Here's an example of DocumentToImageContent used in a query pipeline. It extracts images from documents and sends them to a ChatrPromptBuilder that includes them in the chat message for the model. Note that the model must support multimodal input.
components:
document_to_image:
type: haystack.components.converters.image.document_to_image.DocumentToImageContent
init_parameters:
file_path_meta_field: file_path
root_path: "/data/images"
detail: high
size: [512, 512]
prompt_builder:
type: haystack.components.builders.chat_prompt_builder.ChatPromptBuilder
init_parameters:
template: "- _role: user\n _content:\n - text: 'Analyze the following images and answer this question: {{question}}'\n 'image: {{images}}'\n"
generator:
type: haystack.components.generators.openai.OpenAIGenerator
init_parameters:
model: gpt-4-vision-preview
OpenSearchEmbeddingRetriever:
type: haystack_integrations.components.retrievers.opensearch.embedding_retriever.OpenSearchEmbeddingRetriever
init_parameters:
filters:
top_k: 10
filter_policy: replace
custom_query:
raise_on_failure: true
efficient_filtering: true
document_store:
type: haystack_integrations.document_stores.opensearch.document_store.OpenSearchDocumentStore
init_parameters:
hosts:
index: Standard-Index-English
max_chunk_bytes: 104857600
embedding_dim: 768
return_embedding: false
method:
mappings:
settings:
create_index: true
http_auth:
use_ssl:
verify_certs:
timeout:
DeepsetNvidiaTextEmbedder:
type: deepset_cloud_custom_nodes.embedders.nvidia.text_embedder.DeepsetNvidiaTextEmbedder
init_parameters:
model: intfloat/multilingual-e5-base
prefix: ''
suffix: ''
truncate:
normalize_embeddings: true
timeout:
backend_kwargs:
DeepsetAnswerBuilder:
type: deepset_cloud_custom_nodes.augmenters.deepset_answer_builder.DeepsetAnswerBuilder
init_parameters:
pattern:
reference_pattern:
extract_xml_tags:
connections:
- sender: document_to_image.image_contents
receiver: prompt_builder.images
- sender: prompt_builder.prompt
receiver: generator.prompt
- sender: OpenSearchEmbeddingRetriever.documents
receiver: document_to_image.documents
- sender: DeepsetNvidiaTextEmbedder.embedding
receiver: OpenSearchEmbeddingRetriever.query_embedding
- sender: generator.replies
receiver: DeepsetAnswerBuilder.replies
inputs:
query:
- prompt_builder.question
- DeepsetNvidiaTextEmbedder.text
- DeepsetAnswerBuilder.query
outputs:
answers: DeepsetAnswerBuilder.answers
max_runs_per_component: 100
metadata: {}
Parameters
Init Parameters
These are the parameters you can configure in Pipeline Builder:
| Parameter | Type | Default | Description |
|---|---|---|---|
| file_path_meta_field | str | file_path | The metadata field in the Document that contains the file path to the image or PDF. |
| root_path | Optional[str] | None | The root directory path where document files are located. If provided, file paths in document metadata will be resolved relative to this path. If None, file paths are treated as absolute paths. |
| detail | Optional[Literal] | None | Optional detail level of the image (only supported by OpenAI). Can be "auto", "high", or "low". This will be passed to the created ImageContent objects. |
| size | Optional[Tuple[int, int]] | None | If provided, resizes the image to fit within the specified dimensions (width, height) while maintaining aspect ratio. This reduces file size, memory usage, and processing time. |
Run Method Parameters
These are the parameters you can configure for the component's run() method. This means you can pass these parameters at query time through the API, in Playground, or when running a job. For details, see Modify Pipeline Parameters at Query Time.
| Parameter | Type | Default | Description |
|---|---|---|---|
| documents | List[Document] | List of documents with metadata containing file paths to image or PDF files. |
Was this page helpful?