Skip to main content

NamedEntityExtractor

Annotates named entities in a collection of documents.

Basic Information

  • Type: haystack_integrations.extractors.named_entity_extractor.NamedEntityExtractor

Inputs

ParameterTypeDefaultDescription
documentsList[Document]Documents to process.
batch_sizeint1Batch size used for processing the documents.

Outputs

ParameterTypeDefaultDescription
documentsList[Document]Processed documents.

Overview

Work in Progress

Bear with us while we're working on adding pipeline examples and most common components connections.

Annotates named entities in a collection of documents.

The component supports two backends: Hugging Face and spaCy. The former can be used with any sequence classification model from the Hugging Face model hub, while the latter can be used with any spaCy model that contains an NER component. Annotations are stored as metadata in the documents.

Usage example:

from haystack import Document
from haystack.components.extractors.named_entity_extractor import NamedEntityExtractor

documents = [
Document(content="I'm Merlin, the happy pig!"),
Document(content="My name is Clara and I live in Berkeley, California."),
]
extractor = NamedEntityExtractor(backend="hugging_face", model="dslim/bert-base-NER")
extractor.warm_up()
results = extractor.run(documents=documents)["documents"]
annotations = [NamedEntityExtractor.get_stored_annotations(doc) for doc in results]
print(annotations)

Usage Example

components:
NamedEntityExtractor:
type: components.extractors.named_entity_extractor.NamedEntityExtractor
init_parameters:

Parameters

Init Parameters

These are the parameters you can configure in Pipeline Builder:

ParameterTypeDefaultDescription
backendUnion[str, NamedEntityExtractorBackend]Backend to use for NER.
modelstrName of the model or a path to the model on the local disk. Dependent on the backend.
pipeline_kwargsOptional[Dict[str, Any]]NoneKeyword arguments passed to the pipeline. The pipeline can override these arguments. Dependent on the backend.
deviceOptional[ComponentDevice]NoneThe device on which the model is loaded. If None, the default device is automatically selected. If a device/device map is specified in pipeline_kwargs, it overrides this parameter (only applicable to the HuggingFace backend).
tokenOptional[Secret]Secret.from_env_var(['HF_API_TOKEN', 'HF_TOKEN'], strict=False)The API token to download private models from Hugging Face.

Run Method Parameters

These are the parameters you can configure for the component's run() method. This means you can pass these parameters at query time through the API, in Playground, or when running a job. For details, see Modify Pipeline Parameters at Query Time.

ParameterTypeDefaultDescription
documentsList[Document]Documents to process.
batch_sizeint1Batch size used for processing the documents.