Skip to main content

OpenSearchBM25Retriever

Fetches documents from OpenSearchDocumentStore using the keyword-based BM25 algorithm.

Basic Information

  • Type: haystack_integrations.opensearch.src.haystack_integrations.components.retrievers.opensearch.bm25_retriever.OpenSearchBM25Retriever

Inputs

ParameterTypeDefaultDescription
querystrThe query string.
filtersOptional[Dict[str, Any]]NoneFilters applied to the retrieved documents. The way runtime filters are applied depends on the filter_policy specified at Retriever's initialization.
all_terms_must_matchOptional[bool]NoneIf True, all terms in the query string must be present in the retrieved documents.
top_kOptional[int]NoneMaximum number of documents to return.
fuzzinessOptional[Union[int, str]]NoneFuzziness parameter for full-text queries to apply approximate string matching. For more information, see OpenSearch fuzzy query.
scale_scoreOptional[bool]NoneIf True, scales the score of retrieved documents to a range between 0 and 1. This is useful when comparing documents across different indexes.
custom_queryOptional[Dict[str, Any]]NoneA custom OpenSearch query. It must include a $query and may optionally include a $filters placeholder. An example custom_query: python { "query": { "bool": { "should": [{"multi_match": { "query": "$query", // mandatory query placeholder "type": "most_fields", "fields": ["content", "title"]}}], "filter": "$filters" // optional filter placeholder } } } For this custom_query, a sample run() could be: python retriever.run( query="Why did the revenue increase?", filters={ "operator": "AND", "conditions": [ {"field": "meta.years", "operator": "==", "value": "2019"}, {"field": "meta.quarters", "operator": "in", "value": ["Q1", "Q2"]}, ], }, )

Outputs

ParameterTypeDefaultDescription
documentsList[Document]A dictionary containing the retrieved documents with the following structure: - documents: List of retrieved Documents.

Overview

Work in Progress

Bear with us while we're working on adding pipeline examples and most common components connections.

Fetches documents from OpenSearchDocumentStore using the keyword-based BM25 algorithm.

BM25 computes a weighted word overlap between the query string and a document to determine its similarity.

Usage Example

components:
OpenSearchBM25Retriever:
type: opensearch.src.haystack_integrations.components.retrievers.opensearch.bm25_retriever.OpenSearchBM25Retriever
init_parameters:

Parameters

Init Parameters

These are the parameters you can configure in Pipeline Builder:

ParameterTypeDefaultDescription
document_storeOpenSearchDocumentStoreAn instance of OpenSearchDocumentStore to use with the Retriever.
filtersOptional[Dict[str, Any]]NoneFilters to narrow down the search for documents in the Document Store.
fuzzinessUnion[int, str]AUTODetermines how approximate string matching is applied in full-text queries. This parameter sets the number of character edits (insertions, deletions, or substitutions) required to transform one word into another. For example, the "fuzziness" between the words "wined" and "wind" is 1 because only one edit is needed to match them. Use "AUTO" (the default) for automatic adjustment based on term length, which is optimal for most scenarios. For detailed guidance, refer to the OpenSearch fuzzy query documentation.
top_kint10Maximum number of documents to return.
scale_scoreboolFalseIf True, scales the score of retrieved documents to a range between 0 and 1. This is useful when comparing documents across different indexes.
all_terms_must_matchboolFalseIf True, all terms in the query string must be present in the retrieved documents. This is useful when searching for short text where even one term can make a difference.
filter_policyUnion[str, FilterPolicy]FilterPolicy.REPLACEPolicy to determine how filters are applied. Possible options: - replace: Runtime filters replace initialization filters. Use this policy to change the filtering scope for specific queries. - merge: Runtime filters are merged with initialization filters.
custom_queryOptional[Dict[str, Any]]NoneThe query containing a mandatory $query and an optional $filters placeholder. An example custom_query: python { "query": { "bool": { "should": [{"multi_match": { "query": "$query", // mandatory query placeholder "type": "most_fields", "fields": ["content", "title"]}}], "filter": "$filters" // optional filter placeholder } } } An example run() method for this custom_query: python retriever.run( query="Why did the revenue increase?", filters={ "operator": "AND", "conditions": [ {"field": "meta.years", "operator": "==", "value": "2019"}, {"field": "meta.quarters", "operator": "in", "value": ["Q1", "Q2"]}, ], }, )
raise_on_failureboolTrueWhether to raise an exception if the API call fails. Otherwise log a warning and return an empty list.

Run Method Parameters

These are the parameters you can configure for the component's run() method. This means you can pass these parameters at query time through the API, in Playground, or when running a job. For details, see Modify Pipeline Parameters at Query Time.

ParameterTypeDefaultDescription
querystrThe query string.
filtersOptional[Dict[str, Any]]NoneFilters applied to the retrieved documents. The way runtime filters are applied depends on the filter_policy specified at Retriever's initialization.
all_terms_must_matchOptional[bool]NoneIf True, all terms in the query string must be present in the retrieved documents.
top_kOptional[int]NoneMaximum number of documents to return.
fuzzinessOptional[Union[int, str]]NoneFuzziness parameter for full-text queries to apply approximate string matching. For more information, see OpenSearch fuzzy query.
scale_scoreOptional[bool]NoneIf True, scales the score of retrieved documents to a range between 0 and 1. This is useful when comparing documents across different indexes.
custom_queryOptional[Dict[str, Any]]NoneA custom OpenSearch query. It must include a $query and may optionally include a $filters placeholder. An example custom_query: python { "query": { "bool": { "should": [{"multi_match": { "query": "$query", // mandatory query placeholder "type": "most_fields", "fields": ["content", "title"]}}], "filter": "$filters" // optional filter placeholder } } } For this custom_query, a sample run() could be: python retriever.run( query="Why did the revenue increase?", filters={ "operator": "AND", "conditions": [ {"field": "meta.years", "operator": "==", "value": "2019"}, {"field": "meta.quarters", "operator": "in", "value": ["Q1", "Q2"]}, ], }, )