Skip to main content
For the complete documentation index for agents and LLMs, see llms.txt.

MultiQueryTextRetriever

Retrieve documents using multiple text queries in parallel with a text-based retriever. The component retrieves documents for each query using a thread pool, then combines and deduplicates the results. This improves recall by finding documents relevant to multiple query variations.

Key Features

  • Processes multiple queries in parallel using a text-based retriever (such as BM25).
  • Deduplicates results based on document content across all queries.
  • Sorts the combined results by relevance score.
  • Configurable parallel processing with a thread pool.
  • Designed to work with QueryExpander for query expansion in keyword-based search.

Configuration

  1. Drag the MultiQueryTextRetriever component onto the canvas from the Component Library.
  2. Click on the component to open the configuration panel.
  3. On the General tab, configure the underlying retriever (a text-based retriever such as OpenSearchBM25Retriever).
  4. Go to the Advanced tab to set max_workers for controlling parallel thread execution.

Connections

MultiQueryTextRetriever receives a list of queries through its queries input, typically from QueryExpander. It outputs a deduplicated documents list sorted by relevance score. Connect the documents output to a Ranker, DocumentJoiner, or directly to an LLM component.

Source Code

To check this component's source code, open multi_query_text_retriever.py in the Haystack repository.

Usage Examples

Basic Configuration

  MultiQueryTextRetriever:
type: haystack.components.retrievers.multi_query_text_retriever.MultiQueryTextRetriever
init_parameters:
retriever:
type: haystack_integrations.components.retrievers.opensearch.bm25_retriever.OpenSearchBM25Retriever
init_parameters:
document_store:
type: haystack_integrations.document_stores.opensearch.document_store.OpenSearchDocumentStore
init_parameters:
hosts:
index: default

Connections

This example shows how to perform retrieval with QueryExpander and MultiQueryTextRetriever. You can then send the retrieved documents to a Ranker or DocumentJoiner to combine the results:

components:
query_expander:
type: haystack.components.query.query_expander.QueryExpander
init_parameters:
n_expansions: 3
include_original_query: true

chat_generator:
type: haystack_integrations.components.generators.anthropic.chat.chat_generator.AnthropicChatGenerator
init_parameters: {}
multi_query_retriever:
type: haystack.components.retrievers.multi_query_embedding_retriever.MultiQueryEmbeddingRetriever
init_parameters:
query_embedder:
type: haystack.components.embedders.sentence_transformers_text_embedder.SentenceTransformersTextEmbedder
init_parameters:
model: sentence-transformers/all-MiniLM-L6-v2
retriever:
type: haystack_integrations.components.retrievers.opensearch.embedding_retriever.OpenSearchEmbeddingRetriever
init_parameters:
document_store:
type: haystack_integrations.document_stores.opensearch.document_store.OpenSearchDocumentStore
top_k: 5
max_workers: 3

connections:
- sender: query_expander.queries
receiver: multi_query_retriever.queries

max_runs_per_component: 100

metadata: {}

inputs:
query:
- query_expander.query

Parameters

Inputs

ParameterTypeDescription
queriesList[str]List of text queries to process.
retriever_kwargsOptional[Dict[str, Any]]Optional dictionary of arguments to pass to the retriever's run method.

Outputs

ParameterTypeDescription
documentsList[Document]List of retrieved documents sorted by relevance score, deduplicated by content.

Init Parameters

These are the parameters you can configure in Pipeline Builder:

ParameterTypeDefaultDescription
retrieverTextRetrieverThe text-based retriever to use for document retrieval. Must implement the TextRetriever protocol.
max_workersint3Maximum number of worker threads for parallel processing.

Run Method Parameters

These are the parameters you can configure for the component's run() method. This means you can pass these parameters at query time through the API, in Playground, or when running a job. For details, see Modify Pipeline Parameters at Query Time.

ParameterTypeDefaultDescription
queriesList[str]List of text queries to process.
retriever_kwargsOptional[Dict[str, Any]]NoneOptional dictionary of arguments to pass to the retriever's run method (for example, filters, top_k).