TransformersZeroShotDocumentClassifier

Classify documents based on the labels you provide and add the predicted label to the document's metadata.

Basic Information

Type: haystack_integrations.classifiers.zero_shot_document_classifier.TransformersZeroShotDocumentClassifier
Components it can connect with:
- TextFileToDocument: TransformersZeroShotDocumentClassifier receives documents from TextFileToDocument.
- MetadataRouter: TransformersZeroShotDocumentClassifier sends classified documents to MetadataRouter that routes them further down the pipeline based on their classification.
- Any component that outputs documents or accepts documents as input

Inputs

Parameter	Type	Default	Description
documents	List[Document]		Documents to process.
batch_size	int [Optional]	1	Batch size used for processing the content in each document.

Outputs

Parameter	Type	Default	Description
documents	List[Document]		A list of documents with an added metadata field called `classification`.

Overview

Performs zero-shot classification of documents based on given labels and adds the predicted label to their metadata.

TransformersZeroShotDocumentClassifier uses a Hugging Face pipeline for zero-shot classification. In pipeline configuration, provide the model and the set of labels you want to use for categorization. You can configure the component to allow multiple labels to be true by setting multi_label=True.

TransformersZeroShotDocumentClassifier runs the classification on the document's content field by default. If you want it to run on another field, set the classification_field to one of the document's metadata fields.

You can use the following models for zero-shot classification:

valhalla/distilbart-mnli-12-3
cross-encoder/nli-distilroberta-base
cross-encoder/nli-deberta-v3-xsmall

Usage Example

Using the Component in an Index

In this index, TransformersZeroShotDocumentClassifier classifies documents by sentiment (positive or negative) and sends classified documents to MetadataRouter. MetadataRouter then routes positive documents to one document store and negative documents to another.

components:
  TextFileToDocument:
    type: haystack.components.converters.txt.TextFileToDocument
    init_parameters:
      encoding: utf-8
      store_full_path: false
  TransformersZeroShotDocumentClassifier:
    type: haystack_integrations.classifiers.zero_shot_document_classifier.TransformersZeroShotDocumentClassifier
    init_parameters:
      model: cross-encoder/nli-deberta-v3-xsmall
      labels:
      - positive
      - negative
      multi_label: false
      classification_field:
      device:
      token:
        type: env_var
        env_vars:
        - HF_API_TOKEN
        - HF_TOKEN
        strict: false
      huggingface_pipeline_kwargs:
  MetadataRouter:
    type: haystack.components.routers.metadata_router.MetadataRouter
    init_parameters:
      rules:
        positive:
          operator: OR
          conditions:
          - field: classification.label
            operator: ==
            value: positive
        negative:
          operator: OR
          conditions:
          - field: classification.label
            operator: ==
            value: negative
  DocumentWriter_Positive:
    type: haystack.components.writers.document_writer.DocumentWriter
    init_parameters:
      policy: NONE
      document_store:
        type: haystack_integrations.document_stores.opensearch.document_store.OpenSearchDocumentStore
        init_parameters:
          hosts:
          index: 'positive-sentiment-index'
          max_chunk_bytes: 104857600
          embedding_dim: 768
          return_embedding: false
          method:
          mappings:
          settings:
          create_index: true
          http_auth:
          use_ssl:
          verify_certs:
          timeout:
  DocumentWriter_Negative:
    type: haystack.components.writers.document_writer.DocumentWriter
    init_parameters:
      policy: NONE
      document_store:
        type: haystack_integrations.document_stores.opensearch.document_store.OpenSearchDocumentStore
        init_parameters:
          hosts:
          index: 'negative-sentiment-index'
          max_chunk_bytes: 104857600
          embedding_dim: 768
          return_embedding: false
          method:
          mappings:
          settings:
          create_index: true
          http_auth:
          use_ssl:
          verify_certs:
          timeout:

connections:  # Defines how the components are connected
- sender: TextFileToDocument.documents
  receiver: TransformersZeroShotDocumentClassifier.documents
- sender: TransformersZeroShotDocumentClassifier.documents
  receiver: MetadataRouter.documents
- sender: MetadataRouter.positive
  receiver: DocumentWriter_Positive.documents
- sender: MetadataRouter.negative
  receiver: DocumentWriter_Negative.documents

inputs:  # Define the inputs for your pipeline
  files:
  - TextFileToDocument.sources

max_runs_per_component: 100

metadata: {}

Parameters

Init Parameters

These are the parameters you can configure in Pipeline Builder:

Parameter	Type	Default	Description
model	str		The name or path of a Hugging Face model for zero shot document classification.
labels	List[str]		The set of possible class labels to classify each document into, for example, ["positive", "negative"]. The labels depend on the selected model.
multi_label	bool	False	Whether or not multiple candidate labels can be true. If `False`, the scores are normalized such that the sum of the label likelihoods for each sequence is 1. If `True`, the labels are considered independent and probabilities are normalized for each candidate by doing a softmax of the entailment score vs. the contradiction score.
classification_field	Optional[str]	None	Name of document's meta field to be used for classification. If not set, `Document.content` is used by default.
device	Optional[ComponentDevice]	None	The device on which the model is loaded. If `None`, the default device is automatically selected. If a device/device map is specified in `huggingface_pipeline_kwargs`, it overrides this parameter.
token	Optional[Secret]	Secret.from_env_var(['HF_API_TOKEN', 'HF_TOKEN'], strict=False)	The Hugging Face token to use as HTTP bearer authorization. Check your HF token in your account settings.
huggingface_pipeline_kwargs	Optional[Dict[str, Any]]	None	Dictionary containing keyword arguments used to initialize the Hugging Face pipeline for text classification.

Run Method Parameters

These are the parameters you can configure for the component's run() method. This means you can pass these parameters at query time through the API, in Playground, or when running a job. For details, see Modify Pipeline Parameters at Query Time.

Parameter	Type	Default	Description
documents	List[Document]		Documents to process.
batch_size	int	1	Batch size used for processing the content in each document.

Was this page helpful?

Basic Information​

Inputs​

Outputs​

Overview​

Usage Example​

Using the Component in an Index​

Parameters​

Init Parameters​

Run Method Parameters​