Skip to main content

DocumentJoiner

Joins multiple lists of documents into a single list.

Basic Information

  • Type: haystack_integrations.joiners.document_joiner.DocumentJoiner

Inputs

ParameterTypeDefaultDescription
documentsVariadic[List[Document]]List of list of documents to be merged.
top_kOptional[int]NoneThe maximum number of documents to return. Overrides the instance's top_k if provided.

Outputs

ParameterTypeDefaultDescription
documentsList[Document]A dictionary with the following keys: - documents: Merged list of Documents

Overview

Work in Progress

Bear with us while we're working on adding pipeline examples and most common components connections.

Joins multiple lists of documents into a single list.

It supports different join modes:

  • concatenate: Keeps the highest-scored document in case of duplicates.
  • merge: Calculates a weighted sum of scores for duplicates and merges them.
  • reciprocal_rank_fusion: Merges and assigns scores based on reciprocal rank fusion.
  • distribution_based_rank_fusion: Merges and assigns scores based on scores distribution in each Retriever.

Usage Example

components:
DocumentJoiner:
type: components.joiners.document_joiner.DocumentJoiner
init_parameters:

Parameters

Init Parameters

These are the parameters you can configure in Pipeline Builder:

ParameterTypeDefaultDescription
join_modeUnion[str, JoinMode]JoinMode.CONCATENATESpecifies the join mode to use. Available modes: - concatenate: Keeps the highest-scored document in case of duplicates. - merge: Calculates a weighted sum of scores for duplicates and merges them. - reciprocal_rank_fusion: Merges and assigns scores based on reciprocal rank fusion. - distribution_based_rank_fusion: Merges and assigns scores based on scores distribution in each Retriever.
weightsOptional[List[float]]NoneAssign importance to each list of documents to influence how they're joined. This parameter is ignored for concatenate or distribution_based_rank_fusion join modes. Weight for each list of documents must match the number of inputs.
top_kOptional[int]NoneThe maximum number of documents to return.
sort_by_scoreboolTrueIf True, sorts the documents by score in descending order. If a document has no score, it is handled as if its score is -infinity.

Run Method Parameters

These are the parameters you can configure for the component's run() method. This means you can pass these parameters at query time through the API, in Playground, or when running a job. For details, see Modify Pipeline Parameters at Query Time.

ParameterTypeDefaultDescription
documentsVariadic[List[Document]]List of list of documents to be merged.
top_kOptional[int]NoneThe maximum number of documents to return. Overrides the instance's top_k if provided.