Skip to main content
For the complete documentation index for agents and LLMs, see llms.txt.

OpenSearchBM25Retriever

Fetch documents from OpenSearchDocumentStore using the keyword-based BM25 algorithm. The retriever computes a weighted word overlap between the query and stored documents, making it effective for exact keyword matches such as product names, IDs, or error messages.

Key Features

  • Keyword-based retrieval using the BM25 algorithm.
  • Configurable fuzzy matching to handle approximate string variations.
  • Optional filtering to narrow the search space.
  • Supports custom OpenSearch queries for advanced use cases.
  • Optionally scales document scores to a normalized range for cross-index comparison.
  • Option to require all query terms to be present in retrieved documents.

Configuration

  1. Drag the OpenSearchBM25Retriever component onto the canvas from the Component Library.
  2. Click on the component to open the configuration panel.
  3. On the General tab:
    • Set top_k to control the maximum number of documents to retrieve.
    • Optionally configure filters to narrow down the search space.
    • Set fuzziness to control approximate string matching (default: AUTO).
    • Optionally enable all_terms_must_match to require all query terms to appear in retrieved documents.
  4. Go to the Advanced tab to configure scale_score, filter_policy, custom_query, and raise_on_failure.

Connections

OpenSearchBM25Retriever receives a query string from the Input component and optionally filters. It outputs a documents list. Connect the documents output to a PromptBuilder, a Ranker, or a DocumentJoiner to combine results from multiple retrievers.

Source Code

To check this component's source code, open bm25_retriever.py in the Haystack Core Integrations repository.

Usage Examples

Basic Configuration

  OpenSearchBM25Retriever:
type: haystack_integrations.components.retrievers.opensearch.bm25_retriever.OpenSearchBM25Retriever
init_parameters:
document_store:
type: haystack_integrations.document_stores.opensearch.document_store.OpenSearchDocumentStore
init_parameters:
index: ''
max_chunk_bytes: 104857600
embedding_dim: 768
return_embedding: false
create_index: true
similarity: cosine
fuzziness: AUTO
top_k: 10
scale_score: false
all_terms_must_match: false
filter_policy: replace
raise_on_failure: true

Using the Component in a Pipeline

This is an example of a simple keyword search pipeline where OpenSearchBM25Retriever retrieves documents matching the query.

components:
OpenSearchBM25Retriever:
type: haystack_integrations.components.retrievers.opensearch.bm25_retriever.OpenSearchBM25Retriever
init_parameters:
document_store:
type: haystack_integrations.document_stores.opensearch.document_store.OpenSearchDocumentStore
init_parameters:
hosts:
index: ''
max_chunk_bytes: 104857600
embedding_dim: 768
return_embedding: false
method:
mappings:
settings:
create_index: true
http_auth:
use_ssl:
verify_certs:
timeout:
similarity: cosine
filters:
fuzziness: AUTO
top_k: 10
scale_score: false
all_terms_must_match: false
filter_policy: replace
custom_query:
raise_on_failure: true

connections: []

max_runs_per_component: 100

metadata: {}

inputs:
query:
- OpenSearchBM25Retriever.query
filters:
- OpenSearchBM25Retriever.filters

outputs:
documents: OpenSearchBM25Retriever.documents

Using in a Hybrid Search Pipeline

This example shows a hybrid search pipeline that combines OpenSearchBM25Retriever (keyword search) with OpenSearchEmbeddingRetriever (semantic search). The results are joined and ranked.

components:
bm25_retriever:
type: haystack_integrations.components.retrievers.opensearch.bm25_retriever.OpenSearchBM25Retriever
init_parameters:
document_store:
type: haystack_integrations.document_stores.opensearch.document_store.OpenSearchDocumentStore
init_parameters:
hosts:
index: ''
max_chunk_bytes: 104857600
embedding_dim: 768
return_embedding: false
method:
mappings:
settings:
create_index: true
http_auth:
use_ssl:
verify_certs:
timeout:
similarity: cosine
filters:
fuzziness: AUTO
top_k: 20
scale_score: false
all_terms_must_match: false
filter_policy: replace
custom_query:
raise_on_failure: true
embedding_retriever:
type: haystack_integrations.components.retrievers.opensearch.embedding_retriever.OpenSearchEmbeddingRetriever
init_parameters:
document_store:
type: haystack_integrations.document_stores.opensearch.document_store.OpenSearchDocumentStore
init_parameters:
hosts:
index: ''
max_chunk_bytes: 104857600
embedding_dim: 768
return_embedding: false
method:
mappings:
settings:
create_index: true
http_auth:
use_ssl:
verify_certs:
timeout:
similarity: cosine
filters:
top_k: 20
filter_policy: replace
custom_query:
raise_on_failure: true
efficient_filtering: true
text_embedder:
type: haystack.components.embedders.sentence_transformers_text_embedder.SentenceTransformersTextEmbedder
init_parameters:
model: sentence-transformers/all-MiniLM-L6-v2
device:
token:
prefix: ''
suffix: ''
batch_size: 32
progress_bar: true
normalize_embeddings: false
trust_remote_code: false
document_joiner:
type: haystack.components.joiners.document_joiner.DocumentJoiner
init_parameters:
join_mode: concatenate

connections:
- sender: bm25_retriever.documents
receiver: document_joiner.documents
- sender: embedding_retriever.documents
receiver: document_joiner.documents
- sender: text_embedder.embedding
receiver: embedding_retriever.query_embedding

max_runs_per_component: 100

metadata: {}

inputs:
query:
- bm25_retriever.query
- text_embedder.text
filters:
- bm25_retriever.filters
- embedding_retriever.filters

outputs:
documents: document_joiner.documents

Parameters

Inputs

ParameterTypeDefaultDescription
querystrThe query string.
filtersOptional[Dict[str, Any]]NoneFilters applied to the retrieved documents. The way runtime filters are applied depends on the filter_policy specified at initialization.
all_terms_must_matchOptional[bool]NoneIf True, all terms in the query string must be present in the retrieved documents.
top_kOptional[int]NoneMaximum number of documents to return.
fuzzinessOptional[Union[int, str]]NoneFuzziness parameter for full-text queries to apply approximate string matching. For more information, see OpenSearch fuzzy query.
scale_scoreOptional[bool]NoneIf True, scales the score of retrieved documents to a range between 0 and 1.
custom_queryOptional[Dict[str, Any]]NoneA custom OpenSearch query. It must include a $query placeholder and may optionally include a $filters placeholder.

Outputs

ParameterTypeDefaultDescription
documentsList[Document]List of retrieved documents.

Init Parameters

These are the parameters you can configure in Pipeline Builder:

ParameterTypeDefaultDescription
document_storeOpenSearchDocumentStoreAn instance of OpenSearchDocumentStore to use with the retriever.
filtersOptional[Dict[str, Any]]NoneFilters to narrow down the search for documents in the document store.
fuzzinessUnion[int, str]AUTODetermines how approximate string matching is applied in full-text queries. This parameter sets the number of character edits (insertions, deletions, or substitutions) required to transform one word into another. Use AUTO (the default) for automatic adjustment based on term length. For detailed guidance, refer to the OpenSearch fuzzy query documentation.
top_kint10Maximum number of documents to return.
scale_scoreboolFalseIf True, scales the score of retrieved documents to a range between 0 and 1. This is useful when comparing documents across different indexes.
all_terms_must_matchboolFalseIf True, all terms in the query string must be present in the retrieved documents. This is useful when searching for short text where even one term makes a difference.
filter_policyUnion[str, FilterPolicy]FilterPolicy.REPLACEPolicy to determine how filters are applied. Possible options: replace — runtime filters replace initialization filters (use this to change the filtering scope for specific queries); merge — runtime filters are merged with initialization filters.
custom_queryOptional[Dict[str, Any]]NoneThe query containing a mandatory $query and an optional $filters placeholder.
raise_on_failureboolTrueWhether to raise an exception if the API call fails. If False, logs a warning and returns an empty list.

Run Method Parameters

These are the parameters you can configure for the component's run() method. This means you can pass these parameters at query time through the API, in Playground, or when running a job. For details, see Modify Pipeline Parameters at Query Time.

ParameterTypeDefaultDescription
querystrThe query string.
filtersOptional[Dict[str, Any]]NoneFilters applied to the retrieved documents. The way runtime filters are applied depends on the filter_policy specified at initialization.
all_terms_must_matchOptional[bool]NoneIf True, all terms in the query string must be present in the retrieved documents.
top_kOptional[int]NoneMaximum number of documents to return.
fuzzinessOptional[Union[int, str]]NoneFuzziness parameter for full-text queries to apply approximate string matching. For more information, see OpenSearch fuzzy query.
scale_scoreOptional[bool]NoneIf True, scales the score of retrieved documents to a range between 0 and 1.
custom_queryOptional[Dict[str, Any]]NoneA custom OpenSearch query. It must include a $query placeholder and may optionally include a $filters placeholder.