AmazonBedrockDocumentImageEmbedder
Compute document embeddings from images using Amazon Bedrock models.
Basic Information
- Type:
haystack_integrations.components.embedders.amazon_bedrock.document_image_embedder.AmazonBedrockDocumentImageEmbedder - Components it can connect with:
DocumentWriter:AmazonBedrockDocumentImageEmbeddersends documents with embeddings to be written to a document store.- Converters:
AmazonBedrockDocumentImageEmbeddercan receive documents to embed from a converter.
Inputs
| Parameter | Type | Default | Description |
|---|---|---|---|
| documents | List[Document] | A list of documents with image file paths in their metadata. |
Outputs
| Parameter | Type | Default | Description |
|---|---|---|---|
| documents | List[Document] | Documents with embeddings stored in the embedding field. |
Overview
Use AmazonBedrockDocumentImageEmbedder in indexes to create embeddings from images referenced in documents. This component is useful for building multimodal search applications where you want to find documents based on image similarity.
The embedding of each document is stored in the embedding field of the Document object.
Supported Models
amazon.titan-embed-image-v1cohere.embed-english-v3cohere.embed-multilingual-v3
Authorization
You need AWS credentials to use Amazon Bedrock. Connect deepset to your AWS account by adding secrets with the following keys:
AWS_ACCESS_KEY_IDAWS_SECRET_ACCESS_KEYAWS_DEFAULT_REGION
For details on how to create secrets, see Add Secrets
Usage Example
This is an example indexing pipeline with AmazonBedrockDocumentImageEmbedder for image-based document embedding:
components:
converter:
type: haystack.components.converters.image.file_to_document.ImageFileToDocument
init_parameters: {}
image_embedder:
type: haystack_integrations.components.embedders.amazon_bedrock.document_image_embedder.AmazonBedrockDocumentImageEmbedder
init_parameters:
model: amazon.titan-embed-image-v1
aws_access_key_id:
type: env_var
env_vars:
- AWS_ACCESS_KEY_ID
strict: false
aws_secret_access_key:
type: env_var
env_vars:
- AWS_SECRET_ACCESS_KEY
strict: false
aws_session_token:
type: env_var
env_vars:
- AWS_SESSION_TOKEN
strict: false
aws_region_name:
type: env_var
env_vars:
- AWS_DEFAULT_REGION
strict: false
aws_profile_name:
type: env_var
env_vars:
- AWS_PROFILE
strict: false
file_path_meta_field: file_path
root_path:
image_size:
progress_bar: true
boto3_config:
writer:
type: haystack.components.writers.document_writer.DocumentWriter
init_parameters:
document_store:
type: haystack_integrations.document_stores.opensearch.document_store.OpenSearchDocumentStore
init_parameters:
hosts:
index: 'images'
max_chunk_bytes: 104857600
embedding_dim: 1024
return_embedding: false
method:
mappings:
settings:
create_index: true
http_auth:
use_ssl:
verify_certs:
timeout:
policy: OVERWRITE
connections:
- sender: converter.documents
receiver: image_embedder.documents
- sender: image_embedder.documents
receiver: writer.documents
max_runs_per_component: 100
metadata: {}
inputs:
files:
- converter.sources
Parameters
Init Parameters
These are the parameters you can configure in Pipeline Builder:
| Parameter | Type | Default | Description |
|---|---|---|---|
| model | Literal['amazon.titan-embed-image-v1', 'cohere.embed-english-v3', 'cohere.embed-multilingual-v3'] | The Bedrock model to use for calculating embeddings. | |
| aws_access_key_id | Optional[Secret] | Secret.from_env_var('AWS_ACCESS_KEY_ID') | AWS access key ID. |
| aws_secret_access_key | Optional[Secret] | Secret.from_env_var('AWS_SECRET_ACCESS_KEY') | AWS secret access key. |
| aws_session_token | Optional[Secret] | Secret.from_env_var('AWS_SESSION_TOKEN') | AWS session token for temporary credentials. |
| aws_region_name | Optional[Secret] | Secret.from_env_var('AWS_DEFAULT_REGION') | AWS region name. |
| aws_profile_name | Optional[Secret] | Secret.from_env_var('AWS_PROFILE') | AWS profile name. |
| file_path_meta_field | str | "file_path" | The metadata field in the Document that contains the file path to the image. |
| root_path | Optional[str] | None | The root directory path where document files are located. If provided, file paths in document metadata are resolved relative to this path. |
| image_size | Optional[Tuple[int, int]] | None | If provided, resizes the image to fit within the specified dimensions (width, height) while maintaining aspect ratio. |
| progress_bar | bool | True | If True, shows a progress bar when embedding documents. |
| boto3_config | Optional[Dict[str, Any]] | None | Configuration for the boto3 client. |
Run Method Parameters
These are the parameters you can configure for the component's run() method. You can pass these parameters at query time through the API, in Playground, or when running a job.
| Parameter | Type | Default | Description |
|---|---|---|---|
| documents | List[Document] | A list of documents with image file paths in their metadata. |
Was this page helpful?