DeepsetMetaFieldDocumentGrouper

Group document lists based on their metadata.

Basic Information

  • Pipeline type: Query
  • Type: deepset_cloud_custom_nodes.augmenters.grouper.DeepsetMetaFieldDocumentGrouper
  • Components it can connect with:
    • Retrievers: It can receive documents from a retriever and group them based on a metadata key.
    • Any component that accepts or outputs lists of documents.

Inputs

NameTypeDescription
document_listsList of documents lists
List of documents
The documents to be grouped.

Outputs

NameTypeDescription
document_listsLists of documentsLists of documents grouped by the specified metadata key.

Overview

DeepsetMetaFieldDocumentGrouper groups nested lists of documents by the metadata key you specify. You can use it in scenarios where you must categorize or cluster documents according to a common attribute. For example, you can use it to group documents the Retriever returns to show only one document per file, or to organize documents by topic or category.

You can specify how to sort documents within each group based on their preset scores or the calculated reciprocal rank fusion scores. It also makes it possible to control the number of document lists and the number of documents within each list.

It first groups documents based on the metadata key and then sorts documents within each group according to their scores. After that, it limits the number of documents per group and the number of groups accoring to the top_k values set. Finally, it orders the groups based on the sum or their document's scores.

Usage Example

This is an example of DeepsetMetaFieldDocumentGrouper used together with SimilarDocumentsRetriever:

Parameters

Init Parameters

These are the parameters you can configure in Pipeline Builder:


ParameterTypePossible ValuesDescription
group_byStringDefault: NoneThe metadata key to group the documents by.
Required.
top_k_groupsIntegerDefault: NoneThe maximum number of document groups to return.
Optional.
top_k_docsIntegerDefault: NoneThe maximum number of documents to return within each group.
Optional.
sort_docs_byLiteralpreset_score
rrf_score
Default: None
Specifies how documents within each aggregation group are sorted. Possible options:

- None: No sorting is applied; this is the default option.
- preset_score: Sorts documents by the preset score attribute, from highest to lowest score. For this option, documents must have their score computed before being passed to this component.
- rrf_score: Computes the reciprocal rank fusion score for each document and then sorts documents based on this score from highest to lowest.
Required.

Run Method Parameters

These are the parameters you can configure for the component's run() method. This means you can pass these parameters at query time through the API, in Playground, or when running a job. For details, see Modify Pipeline Parameters at Query Time.


ParameterTypeDescription
document_listsLists of Document objectsThe list of documents to group.
Required.