Skip to main content

SimilarDocumentsRetriever

Retrieve similar documents for each document provided.

Basic Information

  • Type: deepset_cloud_custom_nodes.retrievers.similar_documents_retriever.SimilarDocumentsRetriever
  • Components it can connect with:
    • Any component that outputs a list of documents, such as Rankers or Retrievers.
    • Any component that accepts a lists of documents as input.

Inputs

ParameterTypeDefaultDescription
documentsList[Document]List of documents to find similar documents for. The retriever runs for every document provided.

Outputs

ParameterTypeDefaultDescription
document_listsList[List[Document]]Retrieved documents, similar to the original document list provided to the component.

Overview

Retrieves similar documents for each given document using a preset retriever. You can configure the retriever to use.

Usage example

This example initializes the component with OpenSearchBM25Retriever:

sim_docs_retriever:
type: deepset_cloud_custom_nodes.retrievers.similar_documents_retriever.SimilarDocumentsRetriever
init_parameters:
with_embedding: False
retriever:
type: haystack_integrations.components.retrievers.opensearch.bm25_retriever.OpenSearchBM25Retriever
init_parameters:
document_store:
type: haystack_integrations.document_stores.opensearch.document_store.OpenSearchDocumentStore
init_parameters:
use_ssl: True
verify_certs: False
hosts:
- ${OPENSEARCH_HOST}
http_auth:
- "${OPENSEARCH_USER}"
- "${OPENSEARCH_PASSWORD}"
top_k: 1

Parameters

Init Parameters

These are the parameters you can configure in Pipeline Builder:

ParameterTypeDefaultDescription
retrieverRetrieverRetriever to be used to retrieve similar documents.
with_embeddingboolFalseIf True, assumes the retriever is queried with embeddings rather than text.

Run Method Parameters

These are the parameters you can configure for the component's run() method. This means you can pass these parameters at query time through the API, in Playground, or when running a job. For details, see Modify Pipeline Parameters at Query Time.

ParameterTypeDefaultDescription
documentsList[Document]List of documents to find similar documents for. The retriever runs for every document provided.