AutoMergingRetriever

Improve search results by returning complete parent documents instead of fragmented chunks when multiple related pieces match a query.

Basic Information

Type: haystack.components.retrievers.auto_merging_retriever.AutoMergingRetriever
Components it can connect with:
- Retrievers: AutoMergingRetriever can receive documents from any retriever that returns hierarchical documents.
- PromptBuilder, ChatPromptBuilder, AnswerBuilder, or Ranker: AutoMergingRetriever can send documents to these components to be used in the prompt, answer, or ranking process.

Inputs

Parameter	Type	Default	Description
documents	List[Document]		List of leaf documents that were matched by a retriever

Outputs

Parameter	Type	Default	Description
documents	List[Document]		List of documents (could be a mix of different hierarchy levels)

Overview

AutoMergingRetriever works with a hierarchical document structure to return parent documents instead of individual chunked documents when the number of matched leaf documents exceeds a certain threshold. This is particularly useful when working with paragraphs split into multiple chunks: when several chunks from the same paragraph match your query, the complete paragraph often provides more context and value than the individual pieces alone.

Here's how this Retriever works:

It requires documents to be organized in a tree structure. For information on how to create this structure, see HierarchicalDocumentSplitter documentation for how to create this structure.
When searching, it counts how many chunked documents under the same parent match your query.
If this count exceeds your defined threshold, it returns the parent document instead of the individual chunks.

For example, if a parent document has three child chunks, and you set threshold=0.5, the retriever returns the parent document when at least two of the three chunks (2/3 = 0.66, which is > 0.5) are retrieved.

You can use AutoMergingRetriever with the following Document Stores:

Usage Example

This example shows a RAG pipeline that first retrieves leaf-level document chunks using BM25, merges them into higher-level parent documents with AutoMergingRetriever, constructs a prompt, and generates an answer:

components:
  bm25_retriever:
    type: haystack_integrations.components.retrievers.opensearch.bm25_retriever.OpenSearchBM25Retriever
    init_parameters:
      document_store:
        type: haystack_integrations.document_stores.opensearch.document_store.OpenSearchDocumentStore
        init_parameters:
          hosts:
            - ${OPENSEARCH_HOST}
          http_auth:
            - ${OPENSEARCH_USER}
            - ${OPENSEARCH_PASSWORD}
          use_ssl: true
          verify_certs: false
          index: leaf_documents
      top_k: 10

  auto_merging_retriever:
    type: haystack.components.retrievers.auto_merging_retriever.AutoMergingRetriever
    init_parameters:
      document_store:
        type: haystack_integrations.document_stores.opensearch.document_store.OpenSearchDocumentStore
        init_parameters:
          hosts:
            - ${OPENSEARCH_HOST}
          http_auth:
            - ${OPENSEARCH_USER}
            - ${OPENSEARCH_PASSWORD}
          use_ssl: true
          verify_certs: false
          index: parent_documents
      threshold: 0.6

  chat_prompt_builder:
    type: haystack.components.builders.chat_prompt_builder.ChatPromptBuilder
    init_parameters:
      template:
        - _content:
            - text: "You are a helpful assistant."
          _role: system
        - _content:
            - text: "Given these documents, answer the question.\nDocuments:\n{% for doc in documents %}{{ doc.content }}{% endfor %}\nQuestion: {{question}}\nAnswer:"
          _role: user

  llm:
    type: haystack.components.generators.chat.openai.OpenAIChatGenerator
    init_parameters:
      model: gpt-5-mini
      generation_kwargs:
        temperature: 0.7

  answer_builder:
    type: haystack.components.builders.answer_builder.AnswerBuilder
    init_parameters: {}

connections:
  - sender: bm25_retriever.documents
    receiver: auto_merging_retriever.documents
  - sender: auto_merging_retriever.documents
    receiver: chat_prompt_builder.documents
  - sender: chat_prompt_builder.prompt
    receiver: llm.messages
  - sender: llm.replies
    receiver: answer_builder.replies
  - sender: auto_merging_retriever.documents
    receiver: answer_builder.documents

max_runs_per_component: 100

inputs:
  query:
    - bm25_retriever.query
    - chat_prompt_builder.question
    - answer_builder.query

outputs:
  answers: answer_builder.answers

metadata: {}

info

Before using this pipeline, index your documents using HierarchicalDocumentSplitter to create the hierarchical structure. Leaf documents should be indexed in one document store (for example, leaf_documents), and parent documents in another (for example, parent_documents).

Parameters

Init parameters

These are the parameters you can configure in Pipeline Builder:

Parameter	Type	Default	Description
document_store	DocumentStore		DocumentStore from which to retrieve the parent documents
threshold	float	0.5	Threshold to decide whether the parent instead of the individual documents is returned

Run method parameters

These are the parameters you can configure for the component's run() method. This means you can pass these parameters at query time through the API, in Playground, or when running a job. For details, see Modify Pipeline Parameters at Query Time.

Parameter	Type	Default	Description
documents	List[Document]		List of leaf documents that were matched by a retriever

Was this page helpful?

Basic Information​

Inputs​

Outputs​

Overview​

Usage Example​

Parameters​

Init parameters​

Run method parameters​