ElasticsearchBM25Retriever

Retrieves documents from the ElasticsearchDocumentStore using BM25 algorithm to find the keywords matching the user's query.

Basic Information

Type: haystack_integrations.components.retrievers.elasticsearch.bm25_retriever.ElasticsearchBM25Retriever
Components it can connect with:
- Input: The Retriever receives the user query from the Input component and searches for documents based on it.
- Rankers: The Retriever can send the retrieved documents to a ranker.

Inputs

Parameter	Type	Default	Description
query	str		String to search in the document text.
filters	Optional[Dict[str, Any]]	None	Filters applied to the retrieved documents. The way runtime filters are applied depends on the `filter_policy` chosen at retriever initialization. For details, check the Init Parameters section.
top_k	Optional[int]	None	Maximum number of documents to return.

Outputs

Parameter	Type	Default	Description
documents	List[Document]		List of documents that match the query.

Overview

ElasticsearchBM25Retriever is only compatible with ElasticsearchDocumentStore. It's a keyword-based retriever that uses the BM25 algorithm to find the most similar documents to a user's query. It determines the similarity between the query and the document by calculating the weighted word overlap between the two.

You can use it to find exact matches to names or product codes. It's lightweight and simple and performs well on out-of-domain data.

To combine keyword and embedding-based retrieval, you can use it together with ElasticsearchEmbeddingRetriever and then join the results of the two with a DocumentJoiner.

Usage Example

Using the Component in a Pipeline

This is an example of a document search pipeline that combines keyword-based retrieval with embedding-based retrieval. It uses ElasticsearchBM25Retriever and ElasticsearchEmbeddingRetriever to retrieve documents from the document store. It then joins the results of the two with a DocumentJoiner.

components:
  query_embedder:
    type: deepset_cloud_custom_nodes.embedders.nvidia.text_embedder.DeepsetNvidiaTextEmbedder
    init_parameters:
      normalize_embeddings: true
      model: intfloat/e5-base-v2

  document_joiner:
    type: haystack.components.joiners.document_joiner.DocumentJoiner
    init_parameters:
      join_mode: concatenate

  ranker:
    type: deepset_cloud_custom_nodes.rankers.nvidia.ranker.DeepsetNvidiaRanker
    init_parameters:
      model: "intfloat/simlm-msmarco-reranker"
      top_k: 20

  ElasticsearchEmbeddingRetriever:
    type: haystack_integrations.components.retrievers.elasticsearch.embedding_retriever.ElasticsearchEmbeddingRetriever
    init_parameters:
      filters:
      top_k: 10
      num_candidates:
      filter_policy: replace
      document_store:
        type: haystack_integrations.document_stores.elasticsearch.document_store.ElasticsearchDocumentStore
        init_parameters:
          hosts:
          custom_mapping:
          index: 'my_index'
          embedding_similarity_function: cosine
  ElasticsearchBM25Retriever:
    type: haystack_integrations.components.retrievers.elasticsearch.bm25_retriever.ElasticsearchBM25Retriever
    init_parameters:
      filters:
      fuzziness: AUTO
      top_k: 10
      scale_score: false
      filter_policy: replace
      document_store:
        type: haystack_integrations.document_stores.elasticsearch.document_store.ElasticsearchDocumentStore
        init_parameters:
          hosts:
          custom_mapping:
          index: 'my_index'
          embedding_similarity_function: cosine

connections:  # Defines how the components are connected
- sender: document_joiner.documents
  receiver: ranker.documents
- sender: query_embedder.embedding
  receiver: ElasticsearchEmbeddingRetriever.query_embedding
- sender: ElasticsearchEmbeddingRetriever.documents
  receiver: document_joiner.documents
- sender: ElasticsearchBM25Retriever.documents
  receiver: document_joiner.documents

inputs:  # Define the inputs for your pipeline
  query:  # These components will receive the query as input
  - "query_embedder.text"
  - "ranker.query"
  - ElasticsearchBM25Retriever.query

  filters:  # These components will receive a potential query filter as input
  - "ElasticsearchEmbeddingRetriever.filters"
  - "ElasticsearchBM25Retriever.filters"

outputs:  # Defines the output of your pipeline
  documents: "ranker.documents"  # The output of the pipeline is the retrieved documents

max_runs_per_component: 100

metadata: {}

Parameters

Init Parameters

These are the parameters you can configure in Pipeline Builder:

Parameter	Type	Default	Description
document_store	ElasticsearchDocumentStore		An instance of ElasticsearchDocumentStore to retrieve documents freom.
filters	Optional[Dict[str, Any]]	None	Filters applied to the retrieved documents.
fuzziness	str	AUTO	Fuzziness parameter passed to Elasticsearch. For details, see Elasticsearch documentation.
top_k	int	10	Maximum number of documents to return.
scale_score	bool	False	If `True` scales the Document`s scores between 0 and 1.
filter_policy	Union[str, FilterPolicy]	FilterPolicy.REPLACE	Policy to determine how filters are applied. Possible options: - `REPLACE` (default): Overrides the initialization filters with the filters specified at runtime. Use this policy to dynamically change filtering for specific queries. - `MERGE`: Combines runtime filters with initialization filters to narrow down the search.

Run Method Parameters

These are the parameters you can configure for the component's run() method. This means you can pass these parameters at query time through the API, in Playground, or when running a job. For details, see Modify Pipeline Parameters at Query Time.

Parameter	Type	Default	Description
query	str		String to search in the `Document`s text.
filters	Optional[Dict[str, Any]]	None	Filters applied to the retrieved documents. The way runtime filters are applied depends on the `filter_policy` chosen.
top_k	Optional[int]	None	Maximum number of `Document` to return.

Was this page helpful?

Basic Information​

Inputs​

Outputs​

Overview​

Usage Example​

Using the Component in a Pipeline​

Parameters​

Init Parameters​

Run Method Parameters​