Skip to main content
For the complete documentation index for agents and LLMs, see llms.txt.

SimilarDocumentsRetriever

Retrieve similar documents for each document provided. The component runs a retrieval query for each input document using a preset retriever, making it useful for finding related content based on existing documents.

Key Features

  • Retrieves similar documents for each document in an input list.
  • Runs a separate retrieval query for every document provided.
  • Compatible with any retriever (BM25 or embedding-based).
  • Supports both text-based and embedding-based retrieval modes.

Configuration

  1. Drag the SimilarDocumentsRetriever component onto the canvas from the Component Library.
  2. Click on the component to open the configuration panel.
  3. On the General tab:
    • Configure the underlying retriever to use for finding similar documents.
    • Set with_embedding to True if your retriever is queried with embeddings rather than text.

Connections

SimilarDocumentsRetriever receives a list of documents through its documents input, typically from a ranker or another retriever. It outputs document_lists — a list of lists, where each inner list contains documents similar to the corresponding input document.

Usage Examples

Basic Configuration

  SimilarDocumentsRetriever:
type: deepset_cloud_custom_nodes.retrievers.similar_documents_retriever.SimilarDocumentsRetriever
init_parameters:
retriever:
type: haystack_integrations.components.retrievers.opensearch.embedding_retriever.OpenSearchEmbeddingRetriever
init_parameters:
document_store:
type: haystack_integrations.document_stores.opensearch.document_store.OpenSearchDocumentStore
init_parameters:
hosts:
index: default

Configuration

This example initializes the component with OpenSearchBM25Retriever:

sim_docs_retriever:
type: deepset_cloud_custom_nodes.retrievers.similar_documents_retriever.SimilarDocumentsRetriever
init_parameters:
with_embedding: False
retriever:
type: haystack_integrations.components.retrievers.opensearch.bm25_retriever.OpenSearchBM25Retriever
init_parameters:
document_store:
type: haystack_integrations.document_stores.opensearch.document_store.OpenSearchDocumentStore
init_parameters:
use_ssl: True
verify_certs: False
hosts:
- ${OPENSEARCH_HOST}
http_auth:
- "${OPENSEARCH_USER}"
- "${OPENSEARCH_PASSWORD}"
top_k: 1

Parameters

Inputs

ParameterTypeDefaultDescription
documentsList[Document]List of documents to find similar documents for. The retriever runs for every document provided.

Outputs

ParameterTypeDefaultDescription
document_listsList[List[Document]]Retrieved documents, similar to the original document list provided to the component.

Init Parameters

These are the parameters you can configure in Pipeline Builder:

ParameterTypeDefaultDescription
retrieverRetrieverRetriever to use to retrieve similar documents.
with_embeddingboolFalseIf True, assumes the retriever is queried with embeddings rather than text.

Run Method Parameters

These are the parameters you can configure for the component's run() method. This means you can pass these parameters at query time through the API, in Playground, or when running a job. For details, see Modify Pipeline Parameters at Query Time.

ParameterTypeDefaultDescription
documentsList[Document]List of documents to find similar documents for. The retriever runs for every document provided.