DocumentJoiner

Joins multiple lists of documents into a single list.

Basic Information

Type: haystack_integrations.joiners.document_joiner.DocumentJoiner

Inputs

Parameter	Type	Default	Description
documents	Variadic[List[Document]]		List of list of documents to be merged.
top_k	Optional[int]	None	The maximum number of documents to return. Overrides the instance's `top_k` if provided.

Outputs

Parameter	Type	Default	Description
documents	List[Document]		A dictionary with the following keys: - `documents`: Merged list of Documents

Overview

Work in Progress

Bear with us while we're working on adding pipeline examples and most common components connections.

Joins multiple lists of documents into a single list.

It supports different join modes:

concatenate: Keeps the highest-scored document in case of duplicates.
merge: Calculates a weighted sum of scores for duplicates and merges them.
reciprocal_rank_fusion: Merges and assigns scores based on reciprocal rank fusion.
distribution_based_rank_fusion: Merges and assigns scores based on scores distribution in each Retriever.

Usage Example

components:
  DocumentJoiner:
    type: components.joiners.document_joiner.DocumentJoiner
    init_parameters:

Parameters

Init Parameters

These are the parameters you can configure in Pipeline Builder:

Parameter	Type	Default	Description
join_mode	Union[str, JoinMode]	JoinMode.CONCATENATE	Specifies the join mode to use. Available modes: - `concatenate`: Keeps the highest-scored document in case of duplicates. - `merge`: Calculates a weighted sum of scores for duplicates and merges them. - `reciprocal_rank_fusion`: Merges and assigns scores based on reciprocal rank fusion. - `distribution_based_rank_fusion`: Merges and assigns scores based on scores distribution in each Retriever.
weights	Optional[List[float]]	None	Assign importance to each list of documents to influence how they're joined. This parameter is ignored for `concatenate` or `distribution_based_rank_fusion` join modes. Weight for each list of documents must match the number of inputs.
top_k	Optional[int]	None	The maximum number of documents to return.
sort_by_score	bool	True	If `True`, sorts the documents by score in descending order. If a document has no score, it is handled as if its score is -infinity.

Run Method Parameters

These are the parameters you can configure for the component's run() method. This means you can pass these parameters at query time through the API, in Playground, or when running a job. For details, see Modify Pipeline Parameters at Query Time.

Parameter	Type	Default	Description
documents	Variadic[List[Document]]		List of list of documents to be merged.
top_k	Optional[int]	None	The maximum number of documents to return. Overrides the instance's `top_k` if provided.

Was this page helpful?

Basic Information​

Inputs​

Outputs​

Overview​

Usage Example​

Parameters​

Init Parameters​

Run Method Parameters​