OpenSearchHybridRetriever

Retrieve documents from OpenSearch using a combination of BM25 keyword search and embedding-based semantic search.

Basic Information

Type: haystack_integrations.components.retrievers.opensearch.open_search_hybrid_retriever.OpenSearchHybridRetriever
Components it can connect with:
- PromptBuilder: OpenSearchHybridRetriever can send retrieved documents to PromptBuilder.
- Rankers: OpenSearchHybridRetriever can send retrieved documents to a ranker.
- Input: OpenSearchHybridRetriever can receive the query to search for from the Input component.

Inputs

Parameter	Type	Default	Description
query	str		The query string to search for.
filters_bm25	Optional[Dict[str, Any]]	None	Filters to apply during BM25 retrieval.
filters_embedding	Optional[Dict[str, Any]]	None	Filters to apply during embedding retrieval.
top_k	Optional[int]	None	Maximum number of documents to return from the combined results.

Outputs

Parameter	Type	Default	Description
documents	List[Document]		Documents retrieved and ranked using hybrid search.

Overview

Use OpenSearchHybridRetriever to combine the strengths of keyword-based (BM25) and semantic (embedding-based) search in a single component. This hybrid approach often provides better retrieval quality than using either method alone.

The component performs both BM25 and embedding retrieval in parallel, then combines the results using a configurable join strategy such as Reciprocal Rank Fusion (RRF).

Usage Example

This is an example RAG pipeline with OpenSearchHybridRetriever combining BM25 and embedding-based retrieval:

components:
  hybrid_retriever:
    type: haystack_integrations.components.retrievers.opensearch.open_search_hybrid_retriever.OpenSearchHybridRetriever
    init_parameters:
      document_store:
        type: haystack_integrations.document_stores.opensearch.document_store.OpenSearchDocumentStore
        init_parameters:
          hosts:
          index: 'default'
          max_chunk_bytes: 104857600
          embedding_dim: 768
          return_embedding: false
          method:
          mappings:
          settings:
          create_index: true
          http_auth:
          use_ssl:
          verify_certs:
          timeout:
      embedder:
        type: deepset_cloud_custom_nodes.embedders.nvidia.text_embedder.DeepsetNvidiaTextEmbedder
        init_parameters:
          normalize_embeddings: true
          model: intfloat/e5-base-v2
      filters_bm25:
      fuzziness: AUTO
      top_k_bm25: 20
      scale_score: false
      all_terms_must_match: false
      filter_policy_bm25: replace
      custom_query_bm25:
      filters_embedding:
      top_k_embedding: 20
      filter_policy_embedding: replace
      custom_query_embedding:
      join_mode: reciprocal_rank_fusion
      weights:
      top_k: 10
      sort_by_score: true

  ranker:
    type: deepset_cloud_custom_nodes.rankers.nvidia.ranker.DeepsetNvidiaRanker
    init_parameters:
      model: intfloat/simlm-msmarco-reranker
      top_k: 8

  meta_field_grouping_ranker:
    type: haystack.components.rankers.meta_field_grouping_ranker.MetaFieldGroupingRanker
    init_parameters:
      group_by: file_id
      subgroup_by:
      sort_docs_by: split_id

  answer_builder:
    type: deepset_cloud_custom_nodes.augmenters.deepset_answer_builder.DeepsetAnswerBuilder
    init_parameters:
      reference_pattern: acm

  PromptBuilder:
    type: haystack.components.builders.prompt_builder.PromptBuilder
    init_parameters:
      template: "          You are a technical expert.\n          You answer questions truthfully based on provided documents.\n          If the answer exists in several documents, summarize them.\n          Ignore documents that don't contain the answer to the question.\n          Only answer based on the documents provided. Don't make things up.\n          If no information related to the question can be found in the document, say so.\n          Always use references in the form [NUMBER OF DOCUMENT] when using information from a document, e.g. [3] for Document [3] .\n          Never name the documents, only enter a number in square brackets as a reference.\n          The reference must only refer to the number that comes in square brackets after the document.\n          Otherwise, do not use brackets in your answer and reference ONLY the number of the document without mentioning the word document.\n\n          These are the documents:\n          {%- if documents|length > 0 %}\n          {%- for document in documents %}\n          Document [{{ loop.index }}] :\n          Name of Source File: {{ document.meta.file_name }}\n          {{ document.content }}\n          {% endfor -%}\n          {%- else %}\n          No relevant documents found.\n          Respond with \"Sorry, no matching documents were found, please adjust the filters or try a different question.\"\n          {% endif %}\n\n          Question: {{ question }}\n          Answer:"

      required_variables:
      variables:
  OpenAIGenerator:
    type: haystack.components.generators.openai.OpenAIGenerator
    init_parameters:
      api_key:
        type: env_var
        env_vars:
        - OPENAI_API_KEY
        strict: false
      model: gpt-5-mini
      streaming_callback:
      api_base_url:
      organization:
      system_prompt:
      generation_kwargs:
      timeout:
      max_retries:
      http_client_kwargs:

connections:
- sender: hybrid_retriever.documents
  receiver: ranker.documents
- sender: ranker.documents
  receiver: meta_field_grouping_ranker.documents
- sender: meta_field_grouping_ranker.documents
  receiver: answer_builder.documents
- sender: meta_field_grouping_ranker.documents
  receiver: PromptBuilder.documents
- sender: PromptBuilder.prompt
  receiver: OpenAIGenerator.prompt
- sender: OpenAIGenerator.replies
  receiver: answer_builder.replies

inputs:
  query:
  - "hybrid_retriever.query"
  - "ranker.query"
  - "PromptBuilder.question"
  - "answer_builder.query"

outputs:
  documents: "meta_field_grouping_ranker.documents"
  answers: "answer_builder.answers"

max_runs_per_component: 100

metadata: {}

Parameters

Init Parameters

These are the parameters you can configure in Pipeline Builder:

Parameter	Type	Default	Description
document_store	OpenSearchDocumentStore		An instance of OpenSearchDocumentStore to use with the retriever.
embedder	TextEmbedder		A TextEmbedder component to embed the query for semantic search.
filters_bm25	Optional[Dict[str, Any]]	None	Default filters for BM25 retrieval.
fuzziness	Union[int, str]	"AUTO"	The fuzziness setting for BM25 retrieval.
top_k_bm25	int	10	Number of documents to return from BM25 retrieval.
scale_score	bool	False	Whether to scale the BM25 scores.
all_terms_must_match	bool	False	Whether all query terms must match in BM25 retrieval.
filter_policy_bm25	Union[str, FilterPolicy]	"replace"	How to apply runtime filters for BM25. Options: "replace", "merge".
custom_query_bm25	Optional[Dict[str, Any]]	None	A custom OpenSearch query for BM25 retrieval.
filters_embedding	Optional[Dict[str, Any]]	None	Default filters for embedding retrieval.
top_k_embedding	int	10	Number of documents to return from embedding retrieval.
filter_policy_embedding	Union[str, FilterPolicy]	"replace"	How to apply runtime filters for embedding retrieval. Options: "replace", "merge".
custom_query_embedding	Optional[Dict[str, Any]]	None	A custom OpenSearch query for embedding retrieval.
join_mode	Union[str, JoinMode]	"reciprocal_rank_fusion"	How to combine results from both retrievers. Options: "concatenate", "merge", "reciprocal_rank_fusion", "distribution_based_rank_fusion".
weights	Optional[List[float]]	None	Weights for the joiner when combining results.
top_k	Optional[int]	None	Final number of documents to return after combining results.
sort_by_score	bool	True	Whether to sort the final results by score.

Run Method Parameters

These are the parameters you can configure for the component's run() method. You can pass these parameters at query time through the API, in Playground, or when running a job.

Parameter	Type	Default	Description
query	str		The query string to search for.
filters_bm25	Optional[Dict[str, Any]]	None	Filters to apply during BM25 retrieval. The way filters are applied depends on the `filter_policy_bm25` setting.
filters_embedding	Optional[Dict[str, Any]]	None	Filters to apply during embedding retrieval. The way filters are applied depends on the `filter_policy_embedding` setting.
top_k	Optional[int]	None	Maximum number of documents to return. Overrides the value set at initialization.

Was this page helpful?

Basic Information​

Inputs​

Outputs​

Overview​

Usage Example​

Parameters​

Init Parameters​

Run Method Parameters​