DocumentToImageContent

Extract visual content from images or PDFs and convert them into ImageContent objects you can use for multimodal AI tasks.

Basic Information

Type: haystack.components.converters.image.DocumentToImageContent
Components it can connect to:
- Retrivers: DocumentToImageContent can receive documents from a Retriever.
- ChatPromptBuilder: DocumentToImageContent sends the extracted images to ChatPromptBuilder that includes them in the instructions for the model.
- Any component that outputs documents or accepts ImageContent as input.

Inputs

Parameter	Type	Default	Description
documents	List[Document]		List of documents to extract images from with metadata containing file paths to image or PDF files.

Outputs

Parameter	Type	Default	Description
image_contents	List[Optional[ImageContent]]		A list of `ImageContent` objects extracted from the documents, or `None` for documents that couldn't be processed.

Overview

DocumentToImageContent processes a list of documents with file paths in their metadata pointing to images or PDFs. It extracts visual content from supported file formats.

Documents must have metadata containing:

The file path key with a valid file path that exists when combined with the root path.
A supported image format (MIME type must be one of the supported image types).
For PDF files, a page_number key specifying which page to extract.

When given an image, it extracts and encodes the file directly. When given a PDF, it extracts the specified page using the page_number metadata key and converts it into an image. It can optionally resize images and set detail levels for optimization with different AI models. You can specify the size in the size parameter. DocumentToImageContent resizes the image while keeping the aspect ratio.

Usage Example

Initializing the Component

components:
  DocumentToImageContent:
    type: haystack.components.converters.image.DocumentToImageContent
    init_parameters:
      file_path_meta_field: file_path
      detail: high
      size: [800, 600]

Pipeline Example

Here's an example of DocumentToImageContent used in a query pipeline. It extracts images from documents and sends them to a ChatrPromptBuilder that includes them in the chat message for the model. Note that the model must support multimodal input.

components:
  document_to_image:
    type: haystack.components.converters.image.document_to_image.DocumentToImageContent
    init_parameters:
      file_path_meta_field: file_path
      root_path: "/data/images"
      detail: high
      size: [512, 512]

  prompt_builder:
    type: haystack.components.builders.chat_prompt_builder.ChatPromptBuilder
    init_parameters:
      template: "- _role: user\n  _content:\n    - text: 'Analyze the following images and answer this question: {{question}}'\n            'image: {{images}}'\n"
  generator:
    type: haystack.components.generators.openai.OpenAIGenerator
    init_parameters:
      model: gpt-4-vision-preview

  OpenSearchEmbeddingRetriever:
    type: haystack_integrations.components.retrievers.opensearch.embedding_retriever.OpenSearchEmbeddingRetriever
    init_parameters:
      filters:
      top_k: 10
      filter_policy: replace
      custom_query:
      raise_on_failure: true
      efficient_filtering: true
      document_store:
        type: haystack_integrations.document_stores.opensearch.document_store.OpenSearchDocumentStore
        init_parameters:
          hosts:
          index: Standard-Index-English
          max_chunk_bytes: 104857600
          embedding_dim: 768
          return_embedding: false
          method:
          mappings:
          settings:
          create_index: true
          http_auth:
          use_ssl:
          verify_certs:
          timeout:
  DeepsetNvidiaTextEmbedder:
    type: deepset_cloud_custom_nodes.embedders.nvidia.text_embedder.DeepsetNvidiaTextEmbedder
    init_parameters:
      model: intfloat/multilingual-e5-base
      prefix: ''
      suffix: ''
      truncate:
      normalize_embeddings: true
      timeout:
      backend_kwargs:
  DeepsetAnswerBuilder:
    type: deepset_cloud_custom_nodes.augmenters.deepset_answer_builder.DeepsetAnswerBuilder
    init_parameters:
      pattern:
      reference_pattern:
      extract_xml_tags:

connections:
- sender: document_to_image.image_contents
  receiver: prompt_builder.images
- sender: prompt_builder.prompt
  receiver: generator.prompt
- sender: OpenSearchEmbeddingRetriever.documents
  receiver: document_to_image.documents

- sender: DeepsetNvidiaTextEmbedder.embedding
  receiver: OpenSearchEmbeddingRetriever.query_embedding
- sender: generator.replies
  receiver: DeepsetAnswerBuilder.replies

inputs:
  query:
  - prompt_builder.question
  - DeepsetNvidiaTextEmbedder.text
  - DeepsetAnswerBuilder.query

outputs:
  answers: DeepsetAnswerBuilder.answers

max_runs_per_component: 100

metadata: {}

Parameters

Init Parameters

These are the parameters you can configure in Pipeline Builder:

Parameter	Type	Default	Description
file_path_meta_field	str	file_path	The metadata field in the Document that contains the file path to the image or PDF.
root_path	Optional[str]	None	The root directory path where document files are located. If provided, file paths in document metadata will be resolved relative to this path. If None, file paths are treated as absolute paths.
detail	Optional[Literal]	None	Optional detail level of the image (only supported by OpenAI). Can be "auto", "high", or "low". This will be passed to the created ImageContent objects.
size	Optional[Tuple[int, int]]	None	If provided, resizes the image to fit within the specified dimensions (width, height) while maintaining aspect ratio. This reduces file size, memory usage, and processing time.

Run Method Parameters

These are the parameters you can configure for the component's run() method. This means you can pass these parameters at query time through the API, in Playground, or when running a job. For details, see Modify Pipeline Parameters at Query Time.

Parameter	Type	Default	Description
documents	List[Document]		List of documents with metadata containing file paths to image or PDF files.

Was this page helpful?

Basic Information​

Inputs​

Outputs​

Overview​

Usage Example​

Initializing the Component​

Pipeline Example​

Parameters​

Init Parameters​

Run Method Parameters​