SentenceTransformersDocumentImageEmbedder
Compute image embeddings for a list of documents using Sentence Transformers models.
Basic Information
-
Type:
haystack.components.embedders.image.SentenceTransformersDocumentImageEmbedder -
Components it can connect with:
DocumentWriter:SentenceTransformersDocumentImageEmbeddercan send embedded documents to DocumentWriter for storage.Retriever:SentenceTransformersDocumentImageEmbeddercan receive documents from a Retriever for embedding.
Inputs
| Parameter | Type | Default | Description |
|---|---|---|---|
| documents | List[Document] | Documents to embed. Each document must have a valid file path in its metadata pointing to an image or PDF file. |
Outputs
| Parameter | Type | Default | Description |
|---|---|---|---|
| documents | List[Document] | Documents with embeddings stored in the embedding field. Each document also includes metadata about the embedding source. |
Overview
SentenceTransformersDocumentImageEmbedder uses Sentence Transformers models that can embed text and images. It stores the calculated embreddings in the embedding metadata field of each document.
SentenceTransformersDocumentImageEmbedder supports both direct image files and PDF documents by extracting specific pages as images. It uses pre-trained models that can embed images and text into the same vector space, making it suitable for multimodal applications.
The component automatically handles image preprocessing, including resizing and format conversion, and can process both individual images and PDF pages. Each processed document includes metadata indicating the embedding source type.
Authentication
SentenceTransformersDocumentImageEmbedder uses the Hugging Face Hub to download models. You need to provide an API token to download private models. Connect deepset to your Hugging Face account to use private models hosted on Hugging Face:
Connection Instructions
- Click your profile icon in the top right corner and choose Integrations.

- Click Connect next to the provider.
- Enter your API key and submit it.
Usage Example
Initializing the Component
components:
SentenceTransformersDocumentImageEmbedder:
type: haystack.components.embedders.image.sentence_transformers_doc_image_embedder.SentenceTransformersDocumentImageEmbedder
init_parameters:
file_path_meta_field: file_path
root_path: "/data/images"
model: sentence-transformers/clip-ViT-B-32
device:
token:
batch_size: 32
progress_bar: true
normalize_embeddings: false
trust_remote_code: false
local_files_only: false
model_kwargs:
tokenizer_kwargs:
config_kwargs:
precision: float32
encode_kwargs:
backend: torch
Parameters
Init Parameters
These are the parameters you can configure in Pipeline Builder:
| Parameter | Type | Default | Description |
|---|---|---|---|
| file_path_meta_field | str | file_path | The metadata field in the Document that contains the file path to the image or PDF. |
| root_path | Optional[str] | None | The root directory path where document files are located. If provided, file paths in document metadata will be resolved relative to this path. If None, file paths are treated as absolute paths. |
| model | str | sentence-transformers/clip-ViT-B-32 | The Sentence Transformers model to use for calculating embeddings. Must be able to embed images and text into the same vector space. Compatible models include clip-ViT-B-32, clip-ViT-L-14, clip-ViT-B-16, and others. |
| device | Optional[ComponentDevice] | None | The device to use for loading the model. Overrides the default device. |
| token | Optional[Secret] | None | The API token to download private models from Hugging Face. |
| batch_size | int | 32 | Number of documents to embed at once. |
| progress_bar | bool | True | If True, shows a progress bar when embedding documents. |
| normalize_embeddings | bool | False | If True, the embeddings are normalized using L2 normalization, so that each embedding has a norm of 1. |
| trust_remote_code | bool | False | If False, allows only Hugging Face verified model architectures. If True, allows custom models and scripts. |
| local_files_only | bool | False | If True, does not attempt to download the model from Hugging Face Hub and only looks at local files. |
| model_kwargs | Optional[Dict[str, Any]] | None | Additional keyword arguments for AutoModelForSequenceClassification.from_pretrained when loading the model. |
| tokenizer_kwargs | Optional[Dict[str, Any]] | None | Additional keyword arguments for AutoTokenizer.from_pretrained when loading the tokenizer. |
| config_kwargs | Optional[Dict[str, Any]] | None | Additional keyword arguments for AutoConfig.from_pretrained when loading the model configuration. |
| precision | Literal | float32 | The precision to use for the embeddings. All non-float32 precisions are quantized embeddings. |
| encode_kwargs | Optional[Dict[str, Any]] | None | Additional keyword arguments for SentenceTransformer.encode when embedding documents. |
| backend | Literal | torch | The backend to use for the Sentence Transformers model. Choose from "torch", "onnx", or "openvino". |
Run Method Parameters
These are the parameters you can configure for the component's run() method. This means you can pass these parameters at query time through the API, in Playground, or when running a job. For details, see Modify Pipeline Parameters at Query Time.
| Parameter | Type | Default | Description |
|---|---|---|---|
| documents | List[Document] | Documents to embed. Each document must have a valid file path in its metadata pointing to an image or PDF file. |
Was this page helpful?