SimilarDocumentsRetriever
Retrieve similar documents for each document provided. The component runs a retrieval query for each input document using a preset retriever, making it useful for finding related content based on existing documents.
Key Features
- Retrieves similar documents for each document in an input list.
- Runs a separate retrieval query for every document provided.
- Compatible with any retriever (BM25 or embedding-based).
- Supports both text-based and embedding-based retrieval modes.
Configuration
- Drag the
SimilarDocumentsRetrievercomponent onto the canvas from the Component Library. - Click on the component to open the configuration panel.
- On the General tab:
- Configure the underlying
retrieverto use for finding similar documents. - Set
with_embeddingtoTrueif your retriever is queried with embeddings rather than text.
- Configure the underlying
Connections
SimilarDocumentsRetriever receives a list of documents through its documents input, typically from a ranker or another retriever. It outputs document_lists — a list of lists, where each inner list contains documents similar to the corresponding input document.
Usage Examples
Basic Configuration
SimilarDocumentsRetriever:
type: deepset_cloud_custom_nodes.retrievers.similar_documents_retriever.SimilarDocumentsRetriever
init_parameters:
retriever:
type: haystack_integrations.components.retrievers.opensearch.embedding_retriever.OpenSearchEmbeddingRetriever
init_parameters:
document_store:
type: haystack_integrations.document_stores.opensearch.document_store.OpenSearchDocumentStore
init_parameters:
hosts:
index: default
Configuration
This example initializes the component with OpenSearchBM25Retriever:
sim_docs_retriever:
type: deepset_cloud_custom_nodes.retrievers.similar_documents_retriever.SimilarDocumentsRetriever
init_parameters:
with_embedding: False
retriever:
type: haystack_integrations.components.retrievers.opensearch.bm25_retriever.OpenSearchBM25Retriever
init_parameters:
document_store:
type: haystack_integrations.document_stores.opensearch.document_store.OpenSearchDocumentStore
init_parameters:
use_ssl: True
verify_certs: False
hosts:
- ${OPENSEARCH_HOST}
http_auth:
- "${OPENSEARCH_USER}"
- "${OPENSEARCH_PASSWORD}"
top_k: 1
Parameters
Inputs
| Parameter | Type | Default | Description |
|---|---|---|---|
documents | List[Document] | List of documents to find similar documents for. The retriever runs for every document provided. |
Outputs
| Parameter | Type | Default | Description |
|---|---|---|---|
document_lists | List[List[Document]] | Retrieved documents, similar to the original document list provided to the component. |
Init Parameters
These are the parameters you can configure in Pipeline Builder:
| Parameter | Type | Default | Description |
|---|---|---|---|
retriever | Retriever | Retriever to use to retrieve similar documents. | |
with_embedding | bool | False | If True, assumes the retriever is queried with embeddings rather than text. |
Run Method Parameters
These are the parameters you can configure for the component's run() method. This means you can pass these parameters at query time through the API, in Playground, or when running a job. For details, see Modify Pipeline Parameters at Query Time.
| Parameter | Type | Default | Description |
|---|---|---|---|
documents | List[Document] | List of documents to find similar documents for. The retriever runs for every document provided. |
Was this page helpful?