AmazonBedrockRanker
Rank documents based on their similarity to the query using models hosted on Amazon Bedrock.
Amazon Bedrock is a fully managed service that makes state-of-the-art language models available for use through a unified API. To learn more, see Amazon Bedrock documentation.
Key Features
- Ranks documents by semantic similarity to the query.
- Returns documents in descending order of relevance score.
- Supports the following ranking models:
cohere.rerank-v3-5:0andamazon.rerank-v1:0. - Lets you limit the number of returned documents with
top_k. - Can embed document metadata alongside document content for richer ranking.
Configuration
To use this component, connect Haystack Platform with Amazon Bedrock first. You'll need:
- The region name
- Access key ID
- Secret access key
For detailed explanation, see Use Amazon Bedrock and SageMaker Models.
:::
- Drag the
AmazonBedrockRankercomponent onto the canvas from the Component Library. - Click the component to open the configuration panel.
- On the General tab:
- Select the ranking model from the list.
- Go to the Advanced tab to configure the AWS credentials, number of top results, maximum chunks per document, metadata fields to embed, and metadata separator.
Connections
AmazonBedrockRanker accepts a query string, a list of documents to rank, and an optional top_k value as inputs. It outputs a list of documents sorted by relevance to the query.
Connect a retriever's documents output to the documents input. Connect the pipeline's query input to the query input. Connect the documents output to PromptBuilder or another component that processes ranked documents.
- Drag the
AmazonBedrockRankercomponent onto the canvas from the Component Library. - Click on the component to open the configuration panel.
- On the General tab:
- Select the ranking model.
- Set
top_kto limit the number of documents to return.
- Go to the Advanced tab to configure additional settings, such as
max_chunks_per_doc,meta_fields_to_embed, andmeta_data_separator.
Source Code
To check this component's source code, open ranker.py in the Haystack Core Integrations repository.
Connections
AmazonBedrockRanker receives a query string and a list of documents to rank. Connect a retriever's documents output to its documents input, and connect the pipeline query input to its query input.
It outputs the ranked documents in descending order of relevance. Connect its documents output to a prompt builder or directly to the pipeline output.
Usage Examples
Basic Configuration
AmazonBedrockRanker:
type: haystack_integrations.components.rankers.amazon_bedrock.ranker.AmazonBedrockRanker
init_parameters:
model: cohere.rerank-v3-5:0
top_k: 10
aws_access_key_id:
type: env_var
env_vars:
- AWS_ACCESS_KEY_ID
strict: false
aws_secret_access_key:
type: env_var
env_vars:
- AWS_SECRET_ACCESS_KEY
strict: false
aws_session_token:
type: env_var
env_vars:
- AWS_SESSION_TOKEN
strict: false
aws_region_name:
type: env_var
env_vars:
- AWS_DEFAULT_REGION
strict: false
aws_profile_name:
type: env_var
env_vars:
- AWS_PROFILE
strict: false
meta_data_separator: \n
Using the Component in a Pipeline
This is an example of a document search pipeline that uses AmazonBedrockRanker with the cohere ranking model:
components:
bm25_retriever: # Selects the most similar documents from the document store
type: haystack_integrations.components.retrievers.opensearch.bm25_retriever.OpenSearchBM25Retriever
init_parameters:
document_store:
type: haystack_integrations.document_stores.opensearch.document_store.OpenSearchDocumentStore
init_parameters:
hosts:
index: Standard-Index-English
max_chunk_bytes: 104857600
embedding_dim: 768
return_embedding: false
method:
mappings:
settings:
create_index: true
http_auth:
use_ssl:
verify_certs:
timeout:
top_k: 20 # The number of results to return
fuzziness: 0
query_embedder:
type: deepset_cloud_custom_nodes.embedders.nvidia.text_embedder.DeepsetNvidiaTextEmbedder
init_parameters:
normalize_embeddings: true
model: intfloat/e5-base-v2
embedding_retriever: # Selects the most similar documents from the document store
type: haystack_integrations.components.retrievers.opensearch.embedding_retriever.OpenSearchEmbeddingRetriever
init_parameters:
document_store:
type: haystack_integrations.document_stores.opensearch.document_store.OpenSearchDocumentStore
init_parameters:
hosts:
index: Standard-Index-English
max_chunk_bytes: 104857600
embedding_dim: 768
return_embedding: false
method:
mappings:
settings:
create_index: true
http_auth:
use_ssl:
verify_certs:
timeout:
top_k: 20 # The number of results to return
document_joiner:
type: haystack.components.joiners.document_joiner.DocumentJoiner
init_parameters:
join_mode: concatenate
AmazonBedrockRanker:
type: haystack_integrations.components.rankers.amazon_bedrock.ranker.AmazonBedrockRanker
init_parameters:
model: cohere.rerank-v3-5:0
top_k: 10
aws_access_key_id:
type: env_var
env_vars:
- AWS_ACCESS_KEY_ID
strict: false
aws_secret_access_key:
type: env_var
env_vars:
- AWS_SECRET_ACCESS_KEY
strict: false
aws_session_token:
type: env_var
env_vars:
- AWS_SESSION_TOKEN
strict: false
aws_region_name:
type: env_var
env_vars:
- AWS_DEFAULT_REGION
strict: false
aws_profile_name:
type: env_var
env_vars:
- AWS_PROFILE
strict: false
max_chunks_per_doc:
meta_fields_to_embed:
meta_data_separator: \n
connections: # Defines how the components are connected
- sender: bm25_retriever.documents
receiver: document_joiner.documents
- sender: query_embedder.embedding
receiver: embedding_retriever.query_embedding
- sender: embedding_retriever.documents
receiver: document_joiner.documents
- sender: document_joiner.documents
receiver: AmazonBedrockRanker.documents
inputs: # Define the inputs for your pipeline
query: # These components will receive the query as input
- "bm25_retriever.query"
- "query_embedder.text"
- "AmazonBedrockRanker.query"
filters: # These components will receive a potential query filter as input
- "bm25_retriever.filters"
- "embedding_retriever.filters"
outputs: # Defines the output of your pipeline
documents: "AmazonBedrockRanker.documents" # The output of the pipeline is the retrieved documents
max_runs_per_component: 100
metadata: {}
Parameters
Inputs
| Parameter | Type | Default | Description |
|---|---|---|---|
query | str | The query used for ranking documents by their similarity to the query. | |
documents | List[Document] | The documents to be ranked. | |
top_k | Optional[int] | None | The maximum number of documents you want the Ranker to return. |
Outputs
| Parameter | Type | Description |
|---|---|---|
documents | List[Document] | Documents most similar to the query in descending order of similarity. |
Init Parameters
These are the parameters you can configure in Pipeline Builder:
| Parameter | Type | Default | Description |
|---|---|---|---|
model | str | cohere.rerank-v3-5:0 | The ranking model to use. |
top_k | int | 10 | The maximum number of documents to return. |
aws_access_key_id | Optional[Secret] | Secret.from_env_var(["AWS_ACCESS_KEY_ID"], strict=False) | AWS access key ID. |
aws_secret_access_key | Optional[Secret] | Secret.from_env_var(["AWS_SECRET_ACCESS_KEY"], strict=False) | AWS secret access key. |
aws_session_token | Optional[Secret] | Secret.from_env_var([AWS_SESSION_TOKEN], strict=False) | AWS session token. |
aws_region_name | Optional[Secret] | Secret.from_env_var(["AWS_DEFAULT_REGION"], strict=False) | AWS region name. |
aws_profile_name | Optional[Secret] | Secret.from_env_var(["AWS_PROFILE"], strict=False) | AWS profile name. |
max_chunks_per_doc | Optional[int] | None | If your document exceeds 512 tokens, this setting determines the maximum number of chunks a document can be split into. If set to None, uses the default of 10 chunks. This parameter is not used currently but it's included for future compatibility. |
meta_fields_to_embed | Optional[List[str]] | None | A list of metadata fields to embed in the document content. |
meta_data_separator | str | \n | The separator used to concatenate the metadata fields to the document content. |
Run Method Parameters
These are the parameters you can configure for the component's run() method. This means you can pass these parameters at query time through the API, in Playground, or when running a job. For details, see Modify Pipeline Parameters at Query Time.
| Parameter | Type | Default | Description |
|---|---|---|---|
query | str | The user query for ranking the documents. | |
documents | List[Document] | The documents to rank. | |
top_k | Optional[int] | None | The maximum number of documents you want the Ranker to return. |
Related Information
Was this page helpful?