DeepsetMetaFieldDocumentGrouper
Group document lists based on their metadata.
Basic Information
- Pipeline type: Query
- Type:
deepset_cloud_custom_nodes.augmenters.grouper.DeepsetMetaFieldDocumentGrouper
- Components it can connect with:
- Retrievers: It can receive documents from a retriever and group them based on a metadata key.
- Any component that accepts or outputs lists of documents.
Inputs
Name | Type | Description |
---|---|---|
document_lists | List of documents lists List of documents | The documents to be grouped. |
Outputs
Name | Type | Description |
---|---|---|
document_lists | Lists of documents | Lists of documents grouped by the specified metadata key. |
Overview
DeepsetMetaFieldDocumentGrouper
groups nested lists of documents by the metadata key you specify. You can use it in scenarios where you must categorize or cluster documents according to a common attribute. For example, you can use it to group documents the Retriever returns to show only one document per file, or to organize documents by topic or category.
You can specify how to sort documents within each group based on their preset scores or the calculated reciprocal rank fusion scores. It also makes it possible to control the number of document lists and the number of documents within each list.
It first groups documents based on the metadata key and then sorts documents within each group according to their scores. After that, it limits the number of documents per group and the number of groups accoring to the top_k values set. Finally, it orders the groups based on the sum or their document's scores.
Usage Example
This is an example of DeepsetMetaFieldDocumentGrouper
used together with SimilarDocumentsRetriever
:
data:image/s3,"s3://crabby-images/eb8b1/eb8b1113f2904fd0cfba01c0fbaec5f9aaa7179e" alt=""
Parameters
Init Parameters
These are the parameters you can configure in Pipeline Builder:
Parameter | Type | Possible Values | Description |
---|---|---|---|
group_by | String | Default: None | The metadata key to group the documents by. Required. |
top_k_groups | Integer | Default: None | The maximum number of document groups to return. Optional. |
top_k_docs | Integer | Default: None | The maximum number of documents to return within each group. Optional. |
sort_docs_by | Literal | preset_score rrf_score Default: None | Specifies how documents within each aggregation group are sorted. Possible options: - None : No sorting is applied; this is the default option.- preset_score : Sorts documents by the preset score attribute, from highest to lowest score. For this option, documents must have their score computed before being passed to this component.- rrf_score : Computes the reciprocal rank fusion score for each document and then sorts documents based on this score from highest to lowest.Required. |
Run Method Parameters
These are the parameters you can configure for the component's run()
method. This means you can pass these parameters at query time through the API, in Playground, or when running a job. For details, see Modify Pipeline Parameters at Query Time.
Parameter | Type | Description |
---|---|---|
document_lists | Lists of Document objects | The list of documents to group. Required. |
Updated 8 days ago