ImageFileToDocument

Convert references to image files into empty Document objects with associated metadata.

Basic Information

Type: haystack.components.converters.image.ImageFileToDocument
Components it can connect with:
- FilesInput: ImageFileToDocument can receive file paths from FilesInput.
- DocumentSplitter: ImageFileToDocument can send documents to DocumentSplitter for further processing.

Inputs

Parameter	Type	Default	Description
sources	List[Union[str, Path, ByteStream]]		List of image file paths or ByteStream objects to convert.
meta	Optional[Union[Dict[str, Any], List[Dict[str, Any]]]]	None	Optional metadata to attach to the documents. This value can be a list of dictionaries or a single dictionary. If it's a single dictionary, its content is added to the metadata of all produced documents. If it's a list, its length must match the number of sources as they're zipped together. For ByteStream objects, their `meta` is added to the output documents.

Outputs

Parameter	Type	Default	Description
documents	List[Document]		A list of Document objects with empty content and associated metadata.

Overview

Use ImageFileToDocument in pipelines where image file paths must be wrapped in Document objects so that downstream components, such as the SentenceTransformersImageDocumentEmbedder or LLMDocumentContentExtractor, can process them.

ImageFileToDocument doesn't extract any content from the image files. Instead, it creates Document objects with None as their content and attaches metadata such as file path and any user-provided values.

Usage Example

Initializing the Component

components:
  ImageFileToDocument:
    type: haystack.components.converters.image.file_to_document.ImageFileToDocument
    init_parameters:
      store_full_path: false

Using the Component in an Index

Here's an example of ImageFileToDocument used in an index. It converts image file paths into Document objects that can then be embedded and stored for later retrieval:

components:
  image_file_to_document:
    type: haystack.components.converters.image.file_to_document.ImageFileToDocument
    init_parameters:
      store_full_path: true

  SentenceTransformersImageDocumentEmbedder:
    type: haystack.components.embedders.sentence_transformers_document_embedder.SentenceTransformersDocumentEmbedder
    init_parameters:
      model: clip-ViT-B-32
      device:
      token:
      prefix: ''
      suffix: ''
      batch_size: 32
      progress_bar: true
      normalize_embeddings: false
      meta_fields_to_embed:
      embedding_separator: "\\n"

  DocumentWriter:
    type: haystack.components.writers.document_writer.DocumentWriter
    init_parameters:
      document_store:
        type: haystack_integrations.document_stores.opensearch.document_store.OpenSearchDocumentStore
        init_parameters:
          hosts:
          index: image-index
          max_chunk_bytes: 104857600
          embedding_dim: 512
          return_embedding: false
          method:
          mappings:
          settings:
          create_index: true
          http_auth:
          use_ssl:
          verify_certs:
          timeout:
      policy: OVERWRITE

connections:
- sender: image_file_to_document.documents
  receiver: SentenceTransformersImageDocumentEmbedder.documents
- sender: SentenceTransformersImageDocumentEmbedder.documents
  receiver: DocumentWriter.documents

inputs:
  files:
  - image_file_to_document.sources

max_runs_per_component: 100

metadata: {}

Parameters

Init Parameters

These are the parameters you can configure in Pipeline Builder:

Parameter	Type	Default	Description
store_full_path	bool	False	If `True`, stores the full path of the file in the metadata of the document. If `False`, stores only the file name.

Run Method Parameters

These are the parameters you can configure for the component's run() method. This means you can pass these parameters at query time through the API, in Playground, or when running a job. For details, see Modify Pipeline Parameters at Query Time.

Parameter	Type	Default	Description
sources	List[Union[str, Path, ByteStream]]		List of image file paths or ByteStream objects to convert.
meta	Optional[Union[Dict[str, Any], List[Dict[str, Any]]]]	None	Optional metadata to attach to the documents. This value can be a list of dictionaries or a single dictionary. If it's a single dictionary, its content is added to the metadata of all produced documents. If it's a list, its length must match the number of sources as they're zipped together. For ByteStream objects, their `meta` is added to the output documents.

Was this page helpful?

Basic Information​

Inputs​

Outputs​

Overview​

Usage Example​

Initializing the Component​

Using the Component in an Index​

Parameters​

Init Parameters​

Run Method Parameters​