SimilarDocumentsRetriever
Retrieve similar documents for each document provided. For each input document, the component runs a retriever to find related documents in the document store.
Key Features
- Runs a retriever for each input document to find similar ones.
- Configurable retriever — works with any retriever type, including BM25 and embedding-based retrievers.
- Supports both text-based and embedding-based retrieval via the
with_embeddingflag. - Returns a list of document lists, one per input document.
Configuration
- Drag the
SimilarDocumentsRetrievercomponent onto the canvas from the Component Library. - Click the component to open the configuration panel.
- On the General tab:
- Select the retriever to use for finding similar documents.
- Go to the Advanced tab to configure
with_embedding.
Connections
SimilarDocumentsRetriever accepts a list of documents as input. It outputs document_lists — a list of lists, where each inner list contains documents similar to the corresponding input document.
Typically, you connect this component after a ranker or retriever that produces a set of documents, and use its output for recommendation or related-document features.
Usage Example
This example initializes the component with OpenSearchBM25Retriever:
sim_docs_retriever:
type: deepset_cloud_custom_nodes.retrievers.similar_documents_retriever.SimilarDocumentsRetriever
init_parameters:
with_embedding: False
retriever:
type: haystack_integrations.components.retrievers.opensearch.bm25_retriever.OpenSearchBM25Retriever
init_parameters:
document_store:
type: haystack_integrations.document_stores.opensearch.document_store.OpenSearchDocumentStore
init_parameters:
use_ssl: True
verify_certs: False
hosts:
- ${OPENSEARCH_HOST}
http_auth:
- "${OPENSEARCH_USER}"
- "${OPENSEARCH_PASSWORD}"
top_k: 1
Parameters
Inputs
| Parameter | Type | Default | Description |
|---|---|---|---|
| documents | List[Document] | List of documents to find similar documents for. The retriever runs for every document provided. |
Outputs
| Parameter | Type | Default | Description |
|---|---|---|---|
| document_lists | List[List[Document]] | Retrieved documents, similar to the original document list provided to the component. |
Init Parameters
These are the parameters you can configure in Pipeline Builder:
| Parameter | Type | Default | Description |
|---|---|---|---|
| retriever | Retriever | Retriever to be used to retrieve similar documents. | |
| with_embedding | bool | False | If True, assumes the retriever is queried with embeddings rather than text. |
Run Method Parameters
These are the parameters you can configure for the component's run() method. This means you can pass these parameters at query time through the API, in Playground, or when running a job. For details, see Modify Pipeline Parameters at Query Time.
| Parameter | Type | Default | Description |
|---|---|---|---|
| documents | List[Document] | List of documents to find similar documents for. The retriever runs for every document provided. |
Was this page helpful?