Skip to main content
For the complete documentation index for agents and LLMs, see llms.txt.

AmazonBedrockDocumentImageEmbedder

Compute document embeddings from images using Amazon Bedrock models. Use this component in indexes to create embeddings from images referenced in documents, enabling multimodal semantic search.

Key Features

  • Computes vector embeddings from images referenced in documents.
  • Useful for building multimodal search applications where you want to find documents based on image similarity.
  • Stores embeddings in the embedding field of each Document object.
  • Supports optional image resizing to fit within specified dimensions while maintaining aspect ratio.
  • Supports the following models: amazon.titan-embed-image-v1, cohere.embed-english-v3, and cohere.embed-multilingual-v3.

Configuration

To use this component, connect Haystack Platform to your AWS account by adding secrets with the following keys:

  • AWS_ACCESS_KEY_ID
  • AWS_SECRET_ACCESS_KEY
  • AWS_DEFAULT_REGION

For details on how to create secrets, see Add Secrets.

For instructions on using Bedrock models, see Use Amazon Bedrock and SageMaker Models. :::

  1. Drag the AmazonBedrockDocumentImageEmbedder component onto the canvas from the Component Library.
  2. Click the component to open the configuration panel.
  3. On the General tab:
    1. Select the embedding model from the list.
  4. Go to the Advanced tab to configure the AWS credentials, file path metadata field, root path, image size, progress bar, and boto3 client settings.

Connections

AmazonBedrockDocumentImageEmbedder accepts a list of documents with image file paths in their metadata as input. It outputs a list of documents with embeddings stored in the embedding field.

Connect a converter like ImageFileToDocument to the documents input to provide image documents for embedding. Connect the documents output to DocumentWriter to store the embedded documents in the document store.

  1. Drag the AmazonBedrockDocumentImageEmbedder component onto the canvas from the Component Library.
  2. Click on the component to open the configuration panel.
  3. On the General tab:
    • Select the embedding model from the list.
    • Set the file_path_meta_field to the metadata field in your documents that contains the image file path.
  4. Go to the Advanced tab to configure additional settings, such as root_path, image_size, progress_bar, and boto3_config.

Source Code

To check this component's source code, open document_image_embedder.py in the Haystack Core Integrations repository.

Connections

AmazonBedrockDocumentImageEmbedder receives documents that contain image file paths in their metadata. Connect a converter like ImageFileToDocument to its documents input.

It outputs the same documents with the embedding field populated. Connect its documents output to DocumentWriter to write the embedded documents into the document store.

Usage Examples

Basic Configuration

  image_embedder:
type:
haystack_integrations.components.embedders.amazon_bedrock.document_image_embedder.AmazonBedrockDocumentImageEmbedder
init_parameters:
model: amazon.titan-embed-image-v1
aws_access_key_id:
type: env_var
env_vars:
- AWS_ACCESS_KEY_ID
strict: false
aws_secret_access_key:
type: env_var
env_vars:
- AWS_SECRET_ACCESS_KEY
strict: false
aws_session_token:
type: env_var
env_vars:
- AWS_SESSION_TOKEN
strict: false
aws_region_name:
type: env_var
env_vars:
- AWS_DEFAULT_REGION
strict: false
aws_profile_name:
type: env_var
env_vars:
- AWS_PROFILE
strict: false
file_path_meta_field: file_path
progress_bar: true

This is an example indexing pipeline with AmazonBedrockDocumentImageEmbedder for image-based document embedding:

components:
converter:
type: haystack.components.converters.image.file_to_document.ImageFileToDocument
init_parameters: {}

image_embedder:
type: haystack_integrations.components.embedders.amazon_bedrock.document_image_embedder.AmazonBedrockDocumentImageEmbedder
init_parameters:
model: amazon.titan-embed-image-v1
aws_access_key_id:
type: env_var
env_vars:
- AWS_ACCESS_KEY_ID
strict: false
aws_secret_access_key:
type: env_var
env_vars:
- AWS_SECRET_ACCESS_KEY
strict: false
aws_session_token:
type: env_var
env_vars:
- AWS_SESSION_TOKEN
strict: false
aws_region_name:
type: env_var
env_vars:
- AWS_DEFAULT_REGION
strict: false
aws_profile_name:
type: env_var
env_vars:
- AWS_PROFILE
strict: false
file_path_meta_field: file_path
root_path:
image_size:
progress_bar: true
boto3_config:

writer:
type: haystack.components.writers.document_writer.DocumentWriter
init_parameters:
document_store:
type: haystack_integrations.document_stores.opensearch.document_store.OpenSearchDocumentStore
init_parameters:
hosts:
index: 'images'
max_chunk_bytes: 104857600
embedding_dim: 1024
return_embedding: false
method:
mappings:
settings:
create_index: true
http_auth:
use_ssl:
verify_certs:
timeout:
policy: OVERWRITE

connections:
- sender: converter.documents
receiver: image_embedder.documents
- sender: image_embedder.documents
receiver: writer.documents

max_runs_per_component: 100

metadata: {}

inputs:
files:
- converter.sources

Parameters

Inputs

ParameterTypeDescription
documentsList[Document]A list of documents with image file paths in their metadata.

Outputs

ParameterTypeDescription
documentsList[Document]Documents with embeddings stored in the embedding field.

Init Parameters

These are the parameters you can configure in Pipeline Builder:

ParameterTypeDefaultDescription
modelLiteral['amazon.titan-embed-image-v1', 'cohere.embed-english-v3', 'cohere.embed-multilingual-v3']The Bedrock model to use for calculating embeddings.
aws_access_key_idOptional[Secret]Secret.from_env_var('AWS_ACCESS_KEY_ID')AWS access key ID.
aws_secret_access_keyOptional[Secret]Secret.from_env_var('AWS_SECRET_ACCESS_KEY')AWS secret access key.
aws_session_tokenOptional[Secret]Secret.from_env_var('AWS_SESSION_TOKEN')AWS session token for temporary credentials.
aws_region_nameOptional[Secret]Secret.from_env_var('AWS_DEFAULT_REGION')AWS region name.
aws_profile_nameOptional[Secret]Secret.from_env_var('AWS_PROFILE')AWS profile name.
file_path_meta_fieldstr"file_path"The metadata field in the Document that contains the file path to the image.
root_pathOptional[str]NoneThe root directory path where document files are located. If provided, file paths in document metadata are resolved relative to this path.
image_sizeOptional[Tuple[int, int]]NoneIf provided, resizes the image to fit within the specified dimensions (width, height) while maintaining aspect ratio.
progress_barboolTrueIf True, shows a progress bar when embedding documents.
boto3_configOptional[Dict[str, Any]]NoneConfiguration for the boto3 client.

Run Method Parameters

These are the parameters you can configure for the component's run() method. This means you can pass these parameters at query time through the API, in Playground, or when running a job. For details, see Modify Pipeline Parameters at Query Time.

ParameterTypeDescription
documentsList[Document]A list of documents with image file paths in their metadata.