Skip to main content
For the complete documentation index for agents and LLMs, see llms.txt.

SimilarDocumentsRetriever

Retrieve similar documents for each document provided. For each input document, the component runs a retriever to find related documents in the document store.

Key Features

  • Runs a retriever for each input document to find similar ones.
  • Configurable retriever — works with any retriever type, including BM25 and embedding-based retrievers.
  • Supports both text-based and embedding-based retrieval via the with_embedding flag.
  • Returns a list of document lists, one per input document.

Configuration

  1. Drag the SimilarDocumentsRetriever component onto the canvas from the Component Library.
  2. Click the component to open the configuration panel.
  3. On the General tab:
    1. Select the retriever to use for finding similar documents.
  4. Go to the Advanced tab to configure with_embedding.

Connections

SimilarDocumentsRetriever accepts a list of documents as input. It outputs document_lists — a list of lists, where each inner list contains documents similar to the corresponding input document.

Typically, you connect this component after a ranker or retriever that produces a set of documents, and use its output for recommendation or related-document features.

Usage Example

This example initializes the component with OpenSearchBM25Retriever:

sim_docs_retriever:
type: deepset_cloud_custom_nodes.retrievers.similar_documents_retriever.SimilarDocumentsRetriever
init_parameters:
with_embedding: False
retriever:
type: haystack_integrations.components.retrievers.opensearch.bm25_retriever.OpenSearchBM25Retriever
init_parameters:
document_store:
type: haystack_integrations.document_stores.opensearch.document_store.OpenSearchDocumentStore
init_parameters:
use_ssl: True
verify_certs: False
hosts:
- ${OPENSEARCH_HOST}
http_auth:
- "${OPENSEARCH_USER}"
- "${OPENSEARCH_PASSWORD}"
top_k: 1

Parameters

Inputs

ParameterTypeDefaultDescription
documentsList[Document]List of documents to find similar documents for. The retriever runs for every document provided.

Outputs

ParameterTypeDefaultDescription
document_listsList[List[Document]]Retrieved documents, similar to the original document list provided to the component.

Init Parameters

These are the parameters you can configure in Pipeline Builder:

ParameterTypeDefaultDescription
retrieverRetrieverRetriever to be used to retrieve similar documents.
with_embeddingboolFalseIf True, assumes the retriever is queried with embeddings rather than text.

Run Method Parameters

These are the parameters you can configure for the component's run() method. This means you can pass these parameters at query time through the API, in Playground, or when running a job. For details, see Modify Pipeline Parameters at Query Time.

ParameterTypeDefaultDescription
documentsList[Document]List of documents to find similar documents for. The retriever runs for every document provided.