MultiRetriever

Retrieve documents from multiple indexes and document stores in a single pipeline step, eliminating the need to add multiple retrievers.

MultiRetriever lets you configure several knowledge sources — each with its own document store, index, and retriever strategy — and query all of them at once. The results are combined into a single list of documents passed to downstream components.

Key Features

Configure each knowledge source independently with its own index and retriever type.
Query multiple knowledge sources simultaneously in parallel.
Choose retriever strategies per source: BM25 keyword retrieval, embedding-based retrieval, or hybrid retrieval.
Select a query embedder (deepset NVIDIA, SentenceTransformers, or FastEmbed) for sources that use embedding or hybrid retrieval. The embedding model for the query embedder is automatically synced with the selected index for Opensearch and Elastic document stores.

How Results Are Merged

MultiRetriever queries all configured retrievers in parallel and merges their results using the join_mode parameter:

reciprocal_rank_fusion (default): Deduplicates documents and assigns scores based on each document's rank across retriever result lists using Reciprocal Rank Fusion. Documents that rank highly in multiple lists receive higher scores. Results are returned in descending score order. Use this mode when combining retrievers with incomparable scores, such as BM25 and embedding retrievers.
concatenate: Combines all results into a single list and deduplicates them without re-scoring.

When you set top_k, MultiRetriever always uses reciprocal rank fusion to merge results — regardless of join_mode — so the combined list has a consistent global ranking before it is truncated.

Use top_k_per_retriever to limit how many documents each retriever returns. Use top_k to limit the final number of documents after merging.

Configuration

You configure MultiRetriever through a set of knowledge sources. Each knowledge source defines:

Document store: the backend database (for example, OpenSearch, Pinecone, or Qdrant).
Index: the specific index to query.
Retriever type: the retrieval strategy to use for this source.
Query embedder: the embedder to use when the retriever type requires embedding (for example, embedding or hybrid retrieval).

Supported Retriever Types

The available retriever types depend on the selected document store.

Retriever Type	Description
BM25 (keyword)	Retrieves documents using keyword-based BM25 scoring. No embedder required.
Embedding	Retrieves documents by comparing query and document embeddings. Requires a query embedder.
Hybrid	Combines BM25 and embedding retrieval. Requires a query embedder.

info

SQL and metadata retrievers are not available in MultiRetriever because SQL retrieval requires per-run SQL statements and metadata retrievers return metadata rather than documents.

Query Embedder Options

When you select an embedding or hybrid retriever type, you choose a query embedder for that source. The following embedders are available:

Embedder	Description
DeepsetNvidiaTextEmbedder	Uses NVIDIA Triton models optimized on deepset hardware. Recommended for best performance on the platform.
SentenceTransformersTextEmbedder	Uses SentenceTransformers models. Portable — also works in exported pipelines.
FastembedTextEmbedder	Uses FastEmbed lightweight models. Portable — also works in exported pipelines.

When you select an index created in the platform with OpenSearch or Elasticsearch document store, the component automatically syncs the embedding model from that index.

Adding a Knowledge Source

In Builder, add the MultiRetriever component to your pipeline.
Click the component to open its configuration panel.
Under Knowledge Sources, click Add knowledge source.
Choose a Document Store from the list.
Depending on the document store you selected, either choose an Index from the list or type the index name directly:
- For OpenSearch and Elastic: all available indexes are listed. If there's no available index, create one first. For details, see Create an Index.
- For external document stores (Pinecone etc.): connect with the index by providing confugration manually. These document stores do not use the platform's managed indexes.
For external document stores, fill in the required connection credentials. Each credential field supports workspace and organization secrets — start typing a secret name or select one from the dropdown. When you select a saved secret, the field shows a tag with the secret name instead of the raw value.
Choose the retriever to use. If the retriever type requires embedding, choose a Query Embedder.
- Info: The embedding model is automatically synced from the index you chose, so that the models used to embed the query and the documents are the same.
Configure any extra retriever parameters shown under the retriever type selector.
Click Done to save your settings.

Repeat these steps to add more sources. The component queries all configured sources when the pipeline runs.

info

If a knowledge source has required fields that are not yet filled in, the entry card shows a yellow warning indicator. The configuration drawer also opens expanded for incomplete sources so you can see what needs to be filled in. Complete all required fields before deploying the pipeline.

Editing a Knowledge Source

Under Knowledge Sources, click the knowledge source card you want to edit.
Update the relevant fields.
Click Done to save your changes.

Advanced Settings

Go to the Advanced tab to configure:

join_mode: how to merge results from multiple retrievers (reciprocal_rank_fusion or concatenate).
max_workers: the maximum number of threads for parallel retrieval (default: four).
top_k: the maximum number of documents to return after merging.
top_k_per_retriever: the maximum number of documents each retriever returns.
filters: default metadata filters to apply to all retrievers.

Connections

Input: MultiRetriever receives the query from Input.
Output: MultiRetriever sends the combined list of retrieved documents to downstream components such as a Ranker, LLM, or Agent.

Source Code

To check this component's source code, open multi_retriever.py in the Haystack repository.

Usage Examples

Basic Configuration

  MultiRetriever:
    type: haystack.components.retrievers.multi_retriever.MultiRetriever
    init_parameters:
      retrievers:
        opensearchhybrid:
          type: haystack_integrations.components.retrievers.opensearch.open_search_hybrid_retriever.OpenSearchHybridRetriever
          init_parameters:
            document_store:
              type: haystack_integrations.document_stores.opensearch.document_store.OpenSearchDocumentStore
              init_parameters:
                index: Standard-Index
            embedder:
              type: deepset_cloud_custom_nodes.embedders.nvidia.text_embedder.DeepsetNvidiaTextEmbedder
              init_parameters:
                normalize_embeddings: true
                model: intfloat/e5-base-v2
      join_mode: reciprocal_rank_fusion
      top_k: 10

Using the Component in a Pipeline

This example shows a RAG pipeline that queries two separate OpenSearch indexes using BM25 and embedding retrieval, then generates an answer:

# haystack-pipeline
components:
  MultiRetriever:
    type: haystack.components.retrievers.multi_retriever.MultiRetriever
    init_parameters:
      retrievers:
        opensearchbm25:
          type: haystack_integrations.components.retrievers.opensearch.bm25_retriever.OpenSearchBM25Retriever
          init_parameters:
            document_store:
              type: haystack_integrations.document_stores.opensearch.document_store.OpenSearchDocumentStore
              init_parameters:
                index: index-one
        opensearchembedding_1:
          type: haystack.components.retrievers.text_embedding_retriever.TextEmbeddingRetriever
          init_parameters:
            retriever:
              type: haystack_integrations.components.retrievers.opensearch.embedding_retriever.OpenSearchEmbeddingRetriever
              init_parameters:
                document_store:
                  type: haystack_integrations.document_stores.opensearch.document_store.OpenSearchDocumentStore
                  init_parameters:
                    index: Standard-Index-English-aragats-15-05
            text_embedder:
              type: deepset_cloud_custom_nodes.embedders.nvidia.text_embedder.DeepsetNvidiaTextEmbedder
              init_parameters:
                model: intfloat/e5-base-v2
      join_mode: reciprocal_rank_fusion
      top_k: 10
  LLM:
    type: haystack.components.generators.chat.llm.LLM
    init_parameters:
      chat_generator:
        type: haystack_integrations.components.generators.amazon_bedrock.chat.chat_generator.AmazonBedrockChatGenerator
        init_parameters:
          model: global.anthropic.claude-haiku-4-5-20251001-v1:0
      user_prompt: >-
        {% message role="user" %}

        You are a technical expert. You answer questions truthfully based on
        provided documents. {% for doc in documents %} Document {{ loop.index
        }}: {{ doc.content }} {% endfor %} Question: {{ question }}

        {% endmessage %}
      required_variables: "*"
      system_prompt:

connections:
- sender: MultiRetriever.documents
  receiver: LLM.documents

max_runs_per_component: 100

inputs:
  query:
  - MultiRetriever.query
  - LLM.question

outputs:
  messages: LLM.messages

metadata: {}

Selecting Knowledge Sources at Runtime

Use active_retrievers to run only a subset of configured sources. Names must match the keys in the retrievers dictionary:

inputs:
  query: MultiRetriever.query

params:
  MultiRetriever:
    active_retrievers:
    - opensearchbm25

Parameters

Inputs

Parameter	Type	Description
query	str	The query string to search for across all configured knowledge sources.
filters	Optional[Dict[str, Any]]	Optional metadata filters to apply to all retrievers.

Outputs

Parameter	Type	Description
documents	List[Document]	Combined list of documents retrieved from all configured knowledge sources.

Init Parameters

These are the parameters you can configure in Pipeline Builder:

Parameter	Type	Default	Description
retrievers	Dict[str, TextRetriever]		A named dictionary of text retrievers to run in parallel. Each key is a source name; each value is a retriever component. Configured through the knowledge sources UI in Pipeline Builder.
filters	Optional[Dict[str, Any]]	None	Default metadata filters to apply to all retrievers.
top_k_per_retriever	Optional[int]	None	The maximum number of documents each retriever returns. When set, this value is forwarded to each retriever as its `top_k`. If not set, each retriever uses its own configured `top_k`.
top_k	Optional[int]	None	The maximum number of documents to return after merging. When set, results are merged using reciprocal rank fusion before truncation. If not set, all merged results are returned.
max_workers	int	4	The maximum number of threads to use for parallel retrieval in the synchronous `run()` method.
join_mode	Literal["concatenate", "reciprocal_rank_fusion"]	reciprocal_rank_fusion	How to merge results from multiple retrievers. See How Results Are Merged.

Run Method Parameters

These are the parameters you can configure for the component's run() method. You can pass these parameters at query time through the API, in Playground, or when running a job. For details, see Modify Pipeline Parameters at Query Time.

Parameter	Type	Default	Description
query	str		The query to run against all configured knowledge sources.
filters	Optional[Dict[str, Any]]	None	Metadata filters to apply at query time. Defaults to the value set at initialization.
top_k_per_retriever	Optional[int]	None	The maximum number of documents to return per retriever. When set, overrides the `top_k` configured on each individual retriever. Defaults to the value set at initialization.
top_k	Optional[int]	None	The maximum number of documents to return overall from the combined results of all retrievers. When set, results are merged using reciprocal rank fusion before truncation. Defaults to the value set at initialization.
active_retrievers	Optional[List[str]]	None	Names of knowledge sources to query. Defaults to all configured sources. Must match keys in the `retrievers` dictionary.

Was this page helpful?

Key Features​

How Results Are Merged​

Configuration​

Supported Retriever Types​

Query Embedder Options​

Adding a Knowledge Source​

Editing a Knowledge Source​

Advanced Settings​

Connections​

Source Code​

Usage Examples​

Basic Configuration​

Using the Component in a Pipeline​

Selecting Knowledge Sources at Runtime​

Parameters​

Inputs​

Outputs​

Init Parameters​

Run Method Parameters​

Related Information​