Skip to main content
For the complete documentation index for agents and LLMs, see llms.txt.

MultiRetriever

Retrieve documents from multiple indexes and document stores in a single pipeline step.

MultiRetriever lets you configure several knowledge sources — each with its own document store, index, and retriever strategy — and query all of them at once. The results are combined into a single list of documents passed to downstream components.

Key Features

  • Query multiple indexes and document stores simultaneously.
  • Mix retriever strategies per source: BM25 keyword retrieval, embedding-based retrieval, or hybrid retrieval.
  • Select a query embedder (deepset NVIDIA, SentenceTransformers, or FastEmbed) for sources that use embedding or hybrid retrieval.
  • Configure each knowledge source independently with its own index and retriever type.
  • Automatically syncs the embedding model from the selected index when compatible.

Configuration

You configure MultiRetriever through a set of knowledge sources. Each knowledge source defines:

  • Document store: the backend database (for example, OpenSearch).
  • Index: the specific index to query.
  • Retriever type: the retrieval strategy to use for this source.
  • Query embedder: the embedder to use when the retriever type requires embedding (for example, embedding or hybrid retrieval).

Supported Retriever Types

The available retriever types depend on the selected document store.

Retriever TypeDescription
BM25 (keyword)Retrieves documents using keyword-based BM25 scoring. No embedder required.
EmbeddingRetrieves documents by comparing query and document embeddings. Requires a query embedder.
HybridCombines BM25 and embedding retrieval. Requires a query embedder.
info

SQL and metadata retrievers are not available in MultiRetriever because SQL retrieval requires per-run SQL statements and metadata retrievers return metadata rather than documents.

Query Embedder Options

When you select an embedding or hybrid retriever type, you choose a query embedder for that source. The following embedders are available:

EmbedderDescription
DeepsetNvidiaTextEmbedderUses NVIDIA Triton models optimized on deepset hardware. Recommended for best performance on the platform.
SentenceTransformersTextEmbedderUses SentenceTransformers models. Portable — also works in exported pipelines.
FastembedTextEmbedderUses FastEmbed lightweight models. Portable — also works in exported pipelines.

When you select an index, the component automatically syncs the embedding model from that index if the selected embedder is compatible.

Adding a Knowledge Source

  1. In Builder, add the MultiRetriever component to your pipeline.
  2. Click the component to open its configuration panel.
  3. Under Knowledge Sources, click Add knowledge source.
  4. Choose a Document Store from the list.
  5. Choose an Index for this document store. All indexes for the selected document store are listed. If there's no available index, create one first. For details, see Create an Index.
  6. Choose the retriever to use. If the retriever type requires embedding, chose a Query Embedder. Info: The embedding model is automatically synced from the index you chose, so that the models used to embed the query and the documents are the same.
  7. Click Done to save your settings.

Repeat these steps to add more sources. The component queries all configured sources when the pipeline runs.

Connections

  • Input: MultiRetriever receives the query from Input.
  • Output: MultiRetriever sends the combined list of retrieved documents to downstream components such as a Ranker, LLM, or Agent.

Source Code

To check this component's source code, open multi_retriever.py in the Haystack Core Integrations repository.

Usage Example

Basic Component Configuration

  MultiRetriever:
type: haystack.components.retrievers.multi_retriever.MultiRetriever
init_parameters:
retrievers:
opensearchhybrid:
type: haystack_integrations.components.retrievers.opensearch.open_search_hybrid_retriever.OpenSearchHybridRetriever
init_parameters:
document_store:
type: haystack_integrations.document_stores.opensearch.document_store.OpenSearchDocumentStore
init_parameters:
index: Standard-Index
embedder:
type: deepset_cloud_custom_nodes.embedders.nvidia.text_embedder.DeepsetNvidiaTextEmbedder
init_parameters:
normalize_embeddings: true
model: intfloat/e5-base-v2

Using the Component in a Pipeline

This example shows a RAG pipeline that queries two separate OpenSearch indexes using BM25 retrieval, then ranks and generates an answer:

# haystack-pipeline
components:
MultiRetriever:
type: haystack.components.retrievers.multi_retriever.MultiRetriever
init_parameters:
retrievers:
opensearchbm25:
type: haystack_integrations.components.retrievers.opensearch.bm25_retriever.OpenSearchBM25Retriever
init_parameters:
document_store:
type: haystack_integrations.document_stores.opensearch.document_store.OpenSearchDocumentStore
init_parameters:
index: index-one
opensearchembedding_1:
type: haystack.components.retrievers.text_embedding_retriever.TextEmbeddingRetriever
init_parameters:
retriever:
type: haystack_integrations.components.retrievers.opensearch.embedding_retriever.OpenSearchEmbeddingRetriever
init_parameters:
document_store:
type: haystack_integrations.document_stores.opensearch.document_store.OpenSearchDocumentStore
init_parameters:
index: Standard-Index-English-aragats-15-05
text_embedder:
type: deepset_cloud_custom_nodes.embedders.nvidia.text_embedder.DeepsetNvidiaTextEmbedder
init_parameters:
model: intfloat/e5-base-v2
LLM:
type: haystack.components.generators.chat.llm.LLM
init_parameters:
chat_generator:
type: haystack_integrations.components.generators.amazon_bedrock.chat.chat_generator.AmazonBedrockChatGenerator
init_parameters:
model: global.anthropic.claude-haiku-4-5-20251001-v1:0
user_prompt: >-
{% message role="user" %}

You are a technical expert. You answer questions truthfully based on
provided documents. {% for doc in documents %} Document {{ loop.index
}}: {{ doc.content }} {% endfor %} Question: {{ question }}

{% endmessage %}
required_variables: "*"
system_prompt:

connections:
- sender: MultiRetriever.documents
receiver: LLM.documents

max_runs_per_component: 100

inputs:
query:
- MultiRetriever.query
- LLM.question

outputs:
messages: LLM.messages

metadata: {}

Parameters

Inputs

ParameterTypeDescription
querystrThe query string to search for across all configured knowledge sources.
filtersOptional[Dict[str, Any]]Optional metadata filters to apply to all retrievers.

Outputs

ParameterTypeDescription
documentsList[Document]Combined list of documents retrieved from all configured knowledge sources.

Init Parameters

ParameterTypeDefaultDescription
retrieversList[Dict]List of retriever configurations. Each entry defines a retriever type and its init parameters, including the document store and index to query. Configured through the knowledge sources UI in Pipeline Builder.

Run Method Parameters

ParameterTypeDefaultDescription
querystrThe query to run against all configured knowledge sources.
filtersOptional[Dict[str, Any]]NoneOptional metadata filters to apply at query time.
top_kOptional[int]NoneOverrides the number of documents to return per retriever at query time.