DocumentJoiner
Joins multiple lists of documents into a single list.
Basic Information
- Type:
haystack_integrations.joiners.document_joiner.DocumentJoiner
Inputs
| Parameter | Type | Default | Description |
|---|---|---|---|
| documents | Variadic[List[Document]] | List of list of documents to be merged. | |
| top_k | Optional[int] | None | The maximum number of documents to return. Overrides the instance's top_k if provided. |
Outputs
| Parameter | Type | Default | Description |
|---|---|---|---|
| documents | List[Document] | A dictionary with the following keys: - documents: Merged list of Documents |
Overview
Work in Progress
Bear with us while we're working on adding pipeline examples and most common components connections.
Joins multiple lists of documents into a single list.
It supports different join modes:
- concatenate: Keeps the highest-scored document in case of duplicates.
- merge: Calculates a weighted sum of scores for duplicates and merges them.
- reciprocal_rank_fusion: Merges and assigns scores based on reciprocal rank fusion.
- distribution_based_rank_fusion: Merges and assigns scores based on scores distribution in each Retriever.
Usage Example
components:
DocumentJoiner:
type: components.joiners.document_joiner.DocumentJoiner
init_parameters:
Parameters
Init Parameters
These are the parameters you can configure in Pipeline Builder:
| Parameter | Type | Default | Description |
|---|---|---|---|
| join_mode | Union[str, JoinMode] | JoinMode.CONCATENATE | Specifies the join mode to use. Available modes: - concatenate: Keeps the highest-scored document in case of duplicates. - merge: Calculates a weighted sum of scores for duplicates and merges them. - reciprocal_rank_fusion: Merges and assigns scores based on reciprocal rank fusion. - distribution_based_rank_fusion: Merges and assigns scores based on scores distribution in each Retriever. |
| weights | Optional[List[float]] | None | Assign importance to each list of documents to influence how they're joined. This parameter is ignored for concatenate or distribution_based_rank_fusion join modes. Weight for each list of documents must match the number of inputs. |
| top_k | Optional[int] | None | The maximum number of documents to return. |
| sort_by_score | bool | True | If True, sorts the documents by score in descending order. If a document has no score, it is handled as if its score is -infinity. |
Run Method Parameters
These are the parameters you can configure for the component's run() method. This means you can pass these parameters at query time through the API, in Playground, or when running a job. For details, see Modify Pipeline Parameters at Query Time.
| Parameter | Type | Default | Description |
|---|---|---|---|
| documents | Variadic[List[Document]] | List of list of documents to be merged. | |
| top_k | Optional[int] | None | The maximum number of documents to return. Overrides the instance's top_k if provided. |
Was this page helpful?