Skip to main content
For the complete documentation index for agents and LLMs, see llms.txt.

SentenceTransformersDocumentImageEmbedder

Compute image embeddings for a list of documents using Sentence Transformers models.

Key Features

  • Embeds images and PDF pages using multimodal Sentence Transformers models.
  • Stores computed embeddings in the embedding field of each document.
  • Handles image preprocessing automatically, including resizing and format conversion.
  • Supports PDF documents by extracting specific pages as images.
  • Embeds images and text into the same vector space for multimodal search.

Configuration

Authentication

This component uses the Hugging Face Hub to download models. Connect deepset AI Platform to your Hugging Face account to use private models. For details, see Use Hugging Face Models.

  1. Drag the SentenceTransformersDocumentImageEmbedder component onto the canvas from the Component Library.
  2. Click the component to open the configuration panel.
  3. On the General tab:
    1. Enter the model name, such as sentence-transformers/clip-ViT-B-32.
  4. Go to the Advanced tab to configure the device, token, batch size, and normalize embeddings.

Embedding Models in Query Pipelines and Indexes

The embedding model you use to embed documents in your indexing pipeline must be the same as the embedding model you use to embed the query in your query pipeline.

This means the embedders for your indexing and query pipelines must match. For example, if you use CohereDocumentEmbedder to embed your documents, you should use CohereTextEmbedder with the same model to embed your queries.

Connections

SentenceTransformersDocumentImageEmbedder receives a list of documents as input. Each document must have a valid file path in its metadata pointing to an image or PDF file. It outputs the same documents with their embedding field populated. Connect its output to DocumentWriter to store embedded documents.

Usage Example

This is an index that uses SentenceTransformersDocumentImageEmbedder to embed documents with images:

components:
document_image_embedder:
type: haystack.components.embedders.image.SentenceTransformersDocumentImageEmbedder
init_parameters:
model: sentence-transformers/clip-ViT-B-32
file_path_meta_field: file_path
normalize_embeddings: true

document_writer:
type: haystack.components.writers.document_writer.DocumentWriter
init_parameters:
policy: NONE
document_store:
type: haystack_integrations.document_stores.opensearch.document_store.OpenSearchDocumentStore
init_parameters:
hosts:
- ${OPENSEARCH_HOST}
http_auth:
- ${OPENSEARCH_USER}
- ${OPENSEARCH_PASSWORD}
use_ssl: true
verify_certs: false
index: my_index
embedding_dim: 512

connections:
- sender: document_image_embedder.documents
receiver: document_writer.documents

inputs:
documents:
- document_image_embedder.documents

max_runs_per_component: 100

metadata: {}

Parameters

Inputs

ParameterTypeDefaultDescription
documentsList[Document]Documents to embed. Each document must have a valid file path in its metadata pointing to an image or PDF file.

Outputs

ParameterTypeDefaultDescription
documentsList[Document]Documents with embeddings stored in the embedding field. Each document also includes metadata about the embedding source.

Init Parameters

These are the parameters you can configure in Pipeline Builder:

ParameterTypeDefaultDescription
file_path_meta_fieldstrfile_pathThe metadata field in the Document that contains the file path to the image or PDF.
root_pathOptional[str]NoneThe root directory path where document files are located. If provided, file paths in document metadata will be resolved relative to this path. If None, file paths are treated as absolute paths.
modelstrsentence-transformers/clip-ViT-B-32The Sentence Transformers model to use for calculating embeddings. Must be able to embed images and text into the same vector space. Compatible models include clip-ViT-B-32, clip-ViT-L-14, and clip-ViT-B-16.
deviceOptional[ComponentDevice]NoneThe device to use for loading the model. Overrides the default device.
tokenOptional[Secret]NoneThe API token to download private models from Hugging Face.
batch_sizeint32Number of documents to embed at once.
progress_barboolTrueIf True, shows a progress bar when embedding documents.
normalize_embeddingsboolFalseIf True, the embeddings are normalized using L2 normalization, so that each embedding has a norm of 1.
trust_remote_codeboolFalseIf False, allows only Hugging Face verified model architectures. If True, allows custom models and scripts.
local_files_onlyboolFalseIf True, does not attempt to download the model from Hugging Face Hub and only looks at local files.
model_kwargsOptional[Dict[str, Any]]NoneAdditional keyword arguments for AutoModelForSequenceClassification.from_pretrained when loading the model.
tokenizer_kwargsOptional[Dict[str, Any]]NoneAdditional keyword arguments for AutoTokenizer.from_pretrained when loading the tokenizer.
config_kwargsOptional[Dict[str, Any]]NoneAdditional keyword arguments for AutoConfig.from_pretrained when loading the model configuration.
precisionLiteralfloat32The precision to use for the embeddings. All non-float32 precisions are quantized embeddings.
encode_kwargsOptional[Dict[str, Any]]NoneAdditional keyword arguments for SentenceTransformer.encode when embedding documents.
backendLiteraltorchThe backend to use for the Sentence Transformers model. Choose from "torch", "onnx", or "openvino".

Run Method Parameters

These are the parameters you can configure for the component's run() method. This means you can pass these parameters at query time through the API, in Playground, or when running a job. For details, see Modify Pipeline Parameters at Query Time.

ParameterTypeDefaultDescription
documentsList[Document]Documents to embed. Each document must have a valid file path in its metadata pointing to an image or PDF file.