SentenceTransformersDocumentImageEmbedder
Compute image embeddings for a list of documents using Sentence Transformers models.
We recommend using models available through the DeepsetNvidia components instead of the Sentence Transformers models.. Add a DeepsetNvidia component to your pipeline and choose an appropriate model from the list.
Key Features
- Uses Sentence Transformers models that can embed text and images into the same vector space.
- Supports both direct image files and PDF documents by extracting specific pages as images.
- Stores the computed embedding in the
embeddingfield of each document. - Automatically handles image preprocessing including resizing and format conversion.
- Suitable for multimodal applications.
Configuration
- Drag the
SentenceTransformersDocumentImageEmbeddercomponent onto the canvas from the Component Library. - Click on the component to open the configuration panel.
- On the General tab:
- Set the model name. Compatible models include
clip-ViT-B-32,clip-ViT-L-14,clip-ViT-B-16, and others. - Set the
file_path_meta_fieldto the metadata field that contains the file path to the image or PDF. - Connect the platform to your Hugging Face account to use private models. For instructions, see Use Hugging Face Models.
- Set the model name. Compatible models include
- Go to the Advanced tab to configure
batch_size,normalize_embeddings,device,trust_remote_code, and other parameters.
Embedding Models in Query Pipelines and Indexes
The embedding model you use to embed documents in your indexing pipeline must be the same as the embedding model you use to embed the query in your query pipeline.
This means the embedders for your indexing and query pipelines must match. For example, if you use CohereDocumentEmbedder to embed your documents, you should use CohereTextEmbedder with the same model to embed your queries.
Connections
SentenceTransformersDocumentImageEmbedder accepts a list of documents as input. Each document must have a valid file path in its metadata pointing to an image or PDF file. In an indexing pipeline, connect it to a retriever or directly provide documents.
It outputs a list of documents with the embedding field populated. Connect its documents output to DocumentWriter to store the embedded documents.
Source Code
To check this component's source code, open sentence_transformers_doc_image_embedder.py in the Haystack repository.
Usage Examples
Basic Configuration
document_image_embedder:
type: haystack.components.embedders.image.SentenceTransformersDocumentImageEmbedder
init_parameters:
model: sentence-transformers/clip-ViT-B-32
file_path_meta_field: file_path
normalize_embeddings: true
This is an index that uses SentenceTransformersDocumentImageEmbedder to embed documents with images:
components:
document_image_embedder:
type: haystack.components.embedders.image.SentenceTransformersDocumentImageEmbedder
init_parameters:
model: sentence-transformers/clip-ViT-B-32
file_path_meta_field: file_path
normalize_embeddings: true
document_writer:
type: haystack.components.writers.document_writer.DocumentWriter
init_parameters:
policy: NONE
document_store:
type: haystack_integrations.document_stores.opensearch.document_store.OpenSearchDocumentStore
init_parameters:
hosts:
- ${OPENSEARCH_HOST}
http_auth:
- ${OPENSEARCH_USER}
- ${OPENSEARCH_PASSWORD}
use_ssl: true
verify_certs: false
index: my_index
embedding_dim: 512
connections:
- sender: document_image_embedder.documents
receiver: document_writer.documents
inputs:
documents:
- document_image_embedder.documents
max_runs_per_component: 100
metadata: {}
Parameters
Inputs
| Parameter | Type | Default | Description |
|---|---|---|---|
documents | List[Document] | Documents to embed. Each document must have a valid file path in its metadata pointing to an image or PDF file. |
Outputs
| Parameter | Type | Default | Description |
|---|---|---|---|
documents | List[Document] | Documents with embeddings stored in the embedding field. Each document also includes metadata about the embedding source. |
Init Parameters
These are the parameters you can configure in Pipeline Builder:
| Parameter | Type | Default | Description |
|---|---|---|---|
file_path_meta_field | str | file_path | The metadata field in the Document that contains the file path to the image or PDF. |
root_path | Optional[str] | None | The root directory path where document files are located. If provided, file paths in document metadata will be resolved relative to this path. If None, file paths are treated as absolute paths. |
model | str | sentence-transformers/clip-ViT-B-32 | The Sentence Transformers model to use for calculating embeddings. Must be able to embed images and text into the same vector space. |
device | Optional[ComponentDevice] | None | The device to use for loading the model. Overrides the default device. |
token | Optional[Secret] | None | The API token to download private models from Hugging Face. |
batch_size | int | 32 | Number of documents to embed at once. |
progress_bar | bool | True | If True, shows a progress bar when embedding documents. |
normalize_embeddings | bool | False | If True, the embeddings are normalized using L2 normalization, so that each embedding has a norm of 1. |
trust_remote_code | bool | False | If False, allows only Hugging Face verified model architectures. If True, allows custom models and scripts. |
local_files_only | bool | False | If True, does not attempt to download the model from Hugging Face Hub and only looks at local files. |
model_kwargs | Optional[Dict[str, Any]] | None | Additional keyword arguments for AutoModelForSequenceClassification.from_pretrained when loading the model. |
tokenizer_kwargs | Optional[Dict[str, Any]] | None | Additional keyword arguments for AutoTokenizer.from_pretrained when loading the tokenizer. |
config_kwargs | Optional[Dict[str, Any]] | None | Additional keyword arguments for AutoConfig.from_pretrained when loading the model configuration. |
precision | Literal | float32 | The precision to use for the embeddings. All non-float32 precisions are quantized embeddings. |
encode_kwargs | Optional[Dict[str, Any]] | None | Additional keyword arguments for SentenceTransformer.encode when embedding documents. |
backend | Literal | torch | The backend to use for the Sentence Transformers model. Choose from "torch", "onnx", or "openvino". |
Run Method Parameters
These are the parameters you can configure for the component's run() method. This means you can pass these parameters at query time through the API, in Playground, or when running a job. For details, see Modify Pipeline Parameters at Query Time.
| Parameter | Type | Default | Description |
|---|---|---|---|
documents | List[Document] | Documents to embed. Each document must have a valid file path in its metadata pointing to an image or PDF file. |
Related Information
Was this page helpful?