TransformersZeroShotDocumentClassifier
Classify documents based on the labels you provide and add the predicted label to the document's metadata.
Basic Information
- Type:
haystack_integrations.classifiers.zero_shot_document_classifier.TransformersZeroShotDocumentClassifier - Components it can connect with:
TextFileToDocument:TransformersZeroShotDocumentClassifierreceives documents fromTextFileToDocument.MetadataRouter:TransformersZeroShotDocumentClassifiersends classified documents toMetadataRouterthat routes them further down the pipeline based on their classification.- Any component that outputs documents or accepts documents as input
Inputs
| Parameter | Type | Default | Description |
|---|---|---|---|
| documents | List[Document] | Documents to process. | |
| batch_size | int [Optional] | 1 | Batch size used for processing the content in each document. |
Outputs
| Parameter | Type | Default | Description |
|---|---|---|---|
| documents | List[Document] | A list of documents with an added metadata field called classification. |
Overview
Performs zero-shot classification of documents based on given labels and adds the predicted label to their metadata.
TransformersZeroShotDocumentClassifier uses a Hugging Face pipeline for zero-shot classification.
In pipeline configuration, provide the model and the set of labels you want to use for categorization. You can configure the component to allow multiple labels to be true by setting multi_label=True.
TransformersZeroShotDocumentClassifier runs the classification on the document's content field by default. If you want it to run on another field, set the classification_field to one of the document's metadata fields.
You can use the following models for zero-shot classification:
valhalla/distilbart-mnli-12-3cross-encoder/nli-distilroberta-basecross-encoder/nli-deberta-v3-xsmall
Usage Example
Initializing the Component
components:
TransformersZeroShotDocumentClassifier:
type: components.classifiers.zero_shot_document_classifier.TransformersZeroShotDocumentClassifier
init_parameters:
Parameters
Init Parameters
These are the parameters you can configure in Pipeline Builder:
| Parameter | Type | Default | Description |
|---|---|---|---|
| model | str | The name or path of a Hugging Face model for zero shot document classification. | |
| labels | List[str] | The set of possible class labels to classify each document into, for example, ["positive", "negative"]. The labels depend on the selected model. | |
| multi_label | bool | False | Whether or not multiple candidate labels can be true. If False, the scores are normalized such that the sum of the label likelihoods for each sequence is 1. If True, the labels are considered independent and probabilities are normalized for each candidate by doing a softmax of the entailment score vs. the contradiction score. |
| classification_field | Optional[str] | None | Name of document's meta field to be used for classification. If not set, Document.content is used by default. |
| device | Optional[ComponentDevice] | None | The device on which the model is loaded. If None, the default device is automatically selected. If a device/device map is specified in huggingface_pipeline_kwargs, it overrides this parameter. |
| token | Optional[Secret] | Secret.from_env_var(['HF_API_TOKEN', 'HF_TOKEN'], strict=False) | The Hugging Face token to use as HTTP bearer authorization. Check your HF token in your account settings. |
| huggingface_pipeline_kwargs | Optional[Dict[str, Any]] | None | Dictionary containing keyword arguments used to initialize the Hugging Face pipeline for text classification. |
Run Method Parameters
These are the parameters you can configure for the component's run() method. This means you can pass these parameters at query time through the API, in Playground, or when running a job. For details, see Modify Pipeline Parameters at Query Time.
| Parameter | Type | Default | Description |
|---|---|---|---|
| documents | List[Document] | Documents to process. | |
| batch_size | int | 1 | Batch size used for processing the content in each document. |
Was this page helpful?