Skip to main content

AmazonBedrockDocumentImageEmbedder

Compute document embeddings from images using Amazon Bedrock models.

Basic Information

  • Type: haystack_integrations.components.embedders.amazon_bedrock.document_image_embedder.AmazonBedrockDocumentImageEmbedder
  • Components it can connect with:
    • DocumentWriter: AmazonBedrockDocumentImageEmbedder sends documents with embeddings to be written to a document store.
    • Converters: AmazonBedrockDocumentImageEmbedder can receive documents to embed from a converter.

Inputs

ParameterTypeDefaultDescription
documentsList[Document]A list of documents with image file paths in their metadata.

Outputs

ParameterTypeDefaultDescription
documentsList[Document]Documents with embeddings stored in the embedding field.

Overview

Use AmazonBedrockDocumentImageEmbedder in indexes to create embeddings from images referenced in documents. This component is useful for building multimodal search applications where you want to find documents based on image similarity.

The embedding of each document is stored in the embedding field of the Document object.

Supported Models

  • amazon.titan-embed-image-v1
  • cohere.embed-english-v3
  • cohere.embed-multilingual-v3

Authorization

You need AWS credentials to use Amazon Bedrock. Connect deepset to your AWS account by adding secrets with the following keys:

  • AWS_ACCESS_KEY_ID
  • AWS_SECRET_ACCESS_KEY
  • AWS_DEFAULT_REGION

For details on how to create secrets, see Add Secrets

Usage Example

This is an example indexing pipeline with AmazonBedrockDocumentImageEmbedder for image-based document embedding:

components:
converter:
type: haystack.components.converters.image.file_to_document.ImageFileToDocument
init_parameters: {}

image_embedder:
type: haystack_integrations.components.embedders.amazon_bedrock.document_image_embedder.AmazonBedrockDocumentImageEmbedder
init_parameters:
model: amazon.titan-embed-image-v1
aws_access_key_id:
type: env_var
env_vars:
- AWS_ACCESS_KEY_ID
strict: false
aws_secret_access_key:
type: env_var
env_vars:
- AWS_SECRET_ACCESS_KEY
strict: false
aws_session_token:
type: env_var
env_vars:
- AWS_SESSION_TOKEN
strict: false
aws_region_name:
type: env_var
env_vars:
- AWS_DEFAULT_REGION
strict: false
aws_profile_name:
type: env_var
env_vars:
- AWS_PROFILE
strict: false
file_path_meta_field: file_path
root_path:
image_size:
progress_bar: true
boto3_config:

writer:
type: haystack.components.writers.document_writer.DocumentWriter
init_parameters:
document_store:
type: haystack_integrations.document_stores.opensearch.document_store.OpenSearchDocumentStore
init_parameters:
hosts:
index: 'images'
max_chunk_bytes: 104857600
embedding_dim: 1024
return_embedding: false
method:
mappings:
settings:
create_index: true
http_auth:
use_ssl:
verify_certs:
timeout:
policy: OVERWRITE

connections:
- sender: converter.documents
receiver: image_embedder.documents
- sender: image_embedder.documents
receiver: writer.documents

max_runs_per_component: 100

metadata: {}

inputs:
files:
- converter.sources

Parameters

Init Parameters

These are the parameters you can configure in Pipeline Builder:

ParameterTypeDefaultDescription
modelLiteral['amazon.titan-embed-image-v1', 'cohere.embed-english-v3', 'cohere.embed-multilingual-v3']The Bedrock model to use for calculating embeddings.
aws_access_key_idOptional[Secret]Secret.from_env_var('AWS_ACCESS_KEY_ID')AWS access key ID.
aws_secret_access_keyOptional[Secret]Secret.from_env_var('AWS_SECRET_ACCESS_KEY')AWS secret access key.
aws_session_tokenOptional[Secret]Secret.from_env_var('AWS_SESSION_TOKEN')AWS session token for temporary credentials.
aws_region_nameOptional[Secret]Secret.from_env_var('AWS_DEFAULT_REGION')AWS region name.
aws_profile_nameOptional[Secret]Secret.from_env_var('AWS_PROFILE')AWS profile name.
file_path_meta_fieldstr"file_path"The metadata field in the Document that contains the file path to the image.
root_pathOptional[str]NoneThe root directory path where document files are located. If provided, file paths in document metadata are resolved relative to this path.
image_sizeOptional[Tuple[int, int]]NoneIf provided, resizes the image to fit within the specified dimensions (width, height) while maintaining aspect ratio.
progress_barboolTrueIf True, shows a progress bar when embedding documents.
boto3_configOptional[Dict[str, Any]]NoneConfiguration for the boto3 client.

Run Method Parameters

These are the parameters you can configure for the component's run() method. You can pass these parameters at query time through the API, in Playground, or when running a job.

ParameterTypeDefaultDescription
documentsList[Document]A list of documents with image file paths in their metadata.