Skip to main content
For the complete documentation index for agents and LLMs, see llms.txt.

ImageFileToDocument

Convert references to image files into empty Document objects with associated metadata.

Key Features

  • Wraps image file paths in Haystack Document objects for use by downstream components
  • Creates Document objects with None content and file path stored in metadata
  • Compatible with image embedders such as SentenceTransformersImageDocumentEmbedder
  • Compatible with content extractors such as LLMDocumentContentExtractor
  • Attaches optional user-provided metadata to each resulting document
  • Controls whether to store the full file path or just the file name in document metadata

Configuration

  1. Drag the ImageFileToDocument component onto the canvas from the Component Library.
  2. Click the component to open the configuration panel.
  3. Configure the parameters as needed.

Connections

ImageFileToDocument receives image file paths or ByteStream objects as input — typically from FilesInput. It outputs a list of Document objects with file metadata that you can send to an image embedder like SentenceTransformersImageDocumentEmbedder and then to DocumentWriter for storage.

Usage Example

Using the Component in an Index

Here's an example of ImageFileToDocument used in an index. It converts image file paths into Document objects that can then be embedded and stored for later retrieval:

components:
image_file_to_document:
type: haystack.components.converters.image.file_to_document.ImageFileToDocument
init_parameters:
store_full_path: true

SentenceTransformersImageDocumentEmbedder:
type: haystack.components.embedders.sentence_transformers_document_embedder.SentenceTransformersDocumentEmbedder
init_parameters:
model: clip-ViT-B-32
device:
token:
prefix: ''
suffix: ''
batch_size: 32
progress_bar: true
normalize_embeddings: false
meta_fields_to_embed:
embedding_separator: "\\n"

DocumentWriter:
type: haystack.components.writers.document_writer.DocumentWriter
init_parameters:
document_store:
type: haystack_integrations.document_stores.opensearch.document_store.OpenSearchDocumentStore
init_parameters:
hosts:
index: image-index
max_chunk_bytes: 104857600
embedding_dim: 512
return_embedding: false
method:
mappings:
settings:
create_index: true
http_auth:
use_ssl:
verify_certs:
timeout:
policy: OVERWRITE

connections:
- sender: image_file_to_document.documents
receiver: SentenceTransformersImageDocumentEmbedder.documents
- sender: SentenceTransformersImageDocumentEmbedder.documents
receiver: DocumentWriter.documents

inputs:
files:
- image_file_to_document.sources

max_runs_per_component: 100

metadata: {}

Parameters

Inputs

ParameterTypeDefaultDescription
sourcesList[Union[str, Path, ByteStream]]List of image file paths or ByteStream objects to convert.
metaOptional[Union[Dict[str, Any], List[Dict[str, Any]]]]NoneOptional metadata to attach to the documents. This value can be a list of dictionaries or a single dictionary. If it's a single dictionary, its content is added to the metadata of all produced documents. If it's a list, its length must match the number of sources as they're zipped together. For ByteStream objects, their meta is added to the output documents.

Outputs

ParameterTypeDefaultDescription
documentsList[Document]A list of Document objects with empty content and associated metadata.

Init Parameters

These are the parameters you can configure in Pipeline Builder:

ParameterTypeDefaultDescription
store_full_pathboolFalseIf True, stores the full path of the file in the metadata of the document. If False, stores only the file name.

Run Method Parameters

These are the parameters you can configure for the component's run() method. This means you can pass these parameters at query time through the API, in Playground, or when running a job. For details, see Modify Pipeline Parameters at Query Time.

ParameterTypeDefaultDescription
sourcesList[Union[str, Path, ByteStream]]List of image file paths or ByteStream objects to convert.
metaOptional[Union[Dict[str, Any], List[Dict[str, Any]]]]NoneOptional metadata to attach to the documents. This value can be a list of dictionaries or a single dictionary. If it's a single dictionary, its content is added to the metadata of all produced documents. If it's a list, its length must match the number of sources as they're zipped together. For ByteStream objects, their meta is added to the output documents.