MultiRetriever
Retrieve documents from multiple indexes and document stores in a single pipeline step.
MultiRetriever lets you configure several knowledge sources — each with its own document store, index, and retriever strategy — and query all of them at once. The results are combined into a single list of documents passed to downstream components.
Key Features
- Query multiple indexes and document stores simultaneously.
- Mix retriever strategies per source: BM25 keyword retrieval, embedding-based retrieval, or hybrid retrieval.
- Select a query embedder (deepset NVIDIA, SentenceTransformers, or FastEmbed) for sources that use embedding or hybrid retrieval.
- Configure each knowledge source independently with its own index and retriever type.
- Automatically syncs the embedding model from the selected index when compatible.
Configuration
You configure MultiRetriever through a set of knowledge sources. Each knowledge source defines:
- Document store: the backend database (for example, OpenSearch).
- Index: the specific index to query.
- Retriever type: the retrieval strategy to use for this source.
- Query embedder: the embedder to use when the retriever type requires embedding (for example, embedding or hybrid retrieval).
Supported Retriever Types
The available retriever types depend on the selected document store.
| Retriever Type | Description |
|---|---|
| BM25 (keyword) | Retrieves documents using keyword-based BM25 scoring. No embedder required. |
| Embedding | Retrieves documents by comparing query and document embeddings. Requires a query embedder. |
| Hybrid | Combines BM25 and embedding retrieval. Requires a query embedder. |
SQL and metadata retrievers are not available in MultiRetriever because SQL retrieval requires per-run SQL statements and metadata retrievers return metadata rather than documents.
Query Embedder Options
When you select an embedding or hybrid retriever type, you choose a query embedder for that source. The following embedders are available:
| Embedder | Description |
|---|---|
| DeepsetNvidiaTextEmbedder | Uses NVIDIA Triton models optimized on deepset hardware. Recommended for best performance on the platform. |
| SentenceTransformersTextEmbedder | Uses SentenceTransformers models. Portable — also works in exported pipelines. |
| FastembedTextEmbedder | Uses FastEmbed lightweight models. Portable — also works in exported pipelines. |
When you select an index, the component automatically syncs the embedding model from that index if the selected embedder is compatible.
Adding a Knowledge Source
- In Builder, add the MultiRetriever component to your pipeline.
- Click the component to open its configuration panel.
- Under Knowledge Sources, click Add knowledge source.
- Choose a Document Store from the list.
- Choose an Index for this document store. All indexes for the selected document store are listed. If there's no available index, create one first. For details, see Create an Index.
- Choose the retriever to use. If the retriever type requires embedding, chose a Query Embedder. Info: The embedding model is automatically synced from the index you chose, so that the models used to embed the query and the documents are the same.
- Click Done to save your settings.
Repeat these steps to add more sources. The component queries all configured sources when the pipeline runs.
Connections
- Input: MultiRetriever receives the query from
Input. - Output: MultiRetriever sends the combined list of retrieved documents to downstream components such as a
Ranker,LLM, orAgent.
Source Code
To check this component's source code, open multi_retriever.py in the Haystack Core Integrations repository.
Usage Example
Basic Component Configuration
MultiRetriever:
type: haystack.components.retrievers.multi_retriever.MultiRetriever
init_parameters:
retrievers:
opensearchhybrid:
type: haystack_integrations.components.retrievers.opensearch.open_search_hybrid_retriever.OpenSearchHybridRetriever
init_parameters:
document_store:
type: haystack_integrations.document_stores.opensearch.document_store.OpenSearchDocumentStore
init_parameters:
index: Standard-Index
embedder:
type: deepset_cloud_custom_nodes.embedders.nvidia.text_embedder.DeepsetNvidiaTextEmbedder
init_parameters:
normalize_embeddings: true
model: intfloat/e5-base-v2
Using the Component in a Pipeline
This example shows a RAG pipeline that queries two separate OpenSearch indexes using BM25 retrieval, then ranks and generates an answer:
# haystack-pipeline
components:
MultiRetriever:
type: haystack.components.retrievers.multi_retriever.MultiRetriever
init_parameters:
retrievers:
opensearchbm25:
type: haystack_integrations.components.retrievers.opensearch.bm25_retriever.OpenSearchBM25Retriever
init_parameters:
document_store:
type: haystack_integrations.document_stores.opensearch.document_store.OpenSearchDocumentStore
init_parameters:
index: index-one
opensearchembedding_1:
type: haystack.components.retrievers.text_embedding_retriever.TextEmbeddingRetriever
init_parameters:
retriever:
type: haystack_integrations.components.retrievers.opensearch.embedding_retriever.OpenSearchEmbeddingRetriever
init_parameters:
document_store:
type: haystack_integrations.document_stores.opensearch.document_store.OpenSearchDocumentStore
init_parameters:
index: Standard-Index-English-aragats-15-05
text_embedder:
type: deepset_cloud_custom_nodes.embedders.nvidia.text_embedder.DeepsetNvidiaTextEmbedder
init_parameters:
model: intfloat/e5-base-v2
LLM:
type: haystack.components.generators.chat.llm.LLM
init_parameters:
chat_generator:
type: haystack_integrations.components.generators.amazon_bedrock.chat.chat_generator.AmazonBedrockChatGenerator
init_parameters:
model: global.anthropic.claude-haiku-4-5-20251001-v1:0
user_prompt: >-
{% message role="user" %}
You are a technical expert. You answer questions truthfully based on
provided documents. {% for doc in documents %} Document {{ loop.index
}}: {{ doc.content }} {% endfor %} Question: {{ question }}
{% endmessage %}
required_variables: "*"
system_prompt:
connections:
- sender: MultiRetriever.documents
receiver: LLM.documents
max_runs_per_component: 100
inputs:
query:
- MultiRetriever.query
- LLM.question
outputs:
messages: LLM.messages
metadata: {}
Parameters
Inputs
| Parameter | Type | Description |
|---|---|---|
| query | str | The query string to search for across all configured knowledge sources. |
| filters | Optional[Dict[str, Any]] | Optional metadata filters to apply to all retrievers. |
Outputs
| Parameter | Type | Description |
|---|---|---|
| documents | List[Document] | Combined list of documents retrieved from all configured knowledge sources. |
Init Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
| retrievers | List[Dict] | List of retriever configurations. Each entry defines a retriever type and its init parameters, including the document store and index to query. Configured through the knowledge sources UI in Pipeline Builder. |
Run Method Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
| query | str | The query to run against all configured knowledge sources. | |
| filters | Optional[Dict[str, Any]] | None | Optional metadata filters to apply at query time. |
| top_k | Optional[int] | None | Overrides the number of documents to return per retriever at query time. |
Related Information
Was this page helpful?