NamedEntityExtractor

Annotates named entities in a collection of documents.

Basic Information

Type: haystack_integrations.extractors.named_entity_extractor.NamedEntityExtractor

Inputs

Parameter	Type	Default	Description
documents	List[Document]		Documents to process.
batch_size	int	1	Batch size used for processing the documents.

Outputs

Parameter	Type	Default	Description
documents	List[Document]		Processed documents.

Overview

Work in Progress

Bear with us while we're working on adding pipeline examples and most common components connections.

Annotates named entities in a collection of documents.

The component supports two backends: Hugging Face and spaCy. The former can be used with any sequence classification model from the Hugging Face model hub, while the latter can be used with any spaCy model that contains an NER component. Annotations are stored as metadata in the documents.

Usage example:

from haystack import Document
from haystack.components.extractors.named_entity_extractor import NamedEntityExtractor

documents = [
    Document(content="I'm Merlin, the happy pig!"),
    Document(content="My name is Clara and I live in Berkeley, California."),
]
extractor = NamedEntityExtractor(backend="hugging_face", model="dslim/bert-base-NER")
extractor.warm_up()
results = extractor.run(documents=documents)["documents"]
annotations = [NamedEntityExtractor.get_stored_annotations(doc) for doc in results]
print(annotations)

Usage Example

components:
  NamedEntityExtractor:
    type: components.extractors.named_entity_extractor.NamedEntityExtractor
    init_parameters:

Parameters

Init Parameters

These are the parameters you can configure in Pipeline Builder:

Parameter	Type	Default	Description
backend	Union[str, NamedEntityExtractorBackend]		Backend to use for NER.
model	str		Name of the model or a path to the model on the local disk. Dependent on the backend.
pipeline_kwargs	Optional[Dict[str, Any]]	None	Keyword arguments passed to the pipeline. The pipeline can override these arguments. Dependent on the backend.
device	Optional[ComponentDevice]	None	The device on which the model is loaded. If `None`, the default device is automatically selected. If a device/device map is specified in `pipeline_kwargs`, it overrides this parameter (only applicable to the HuggingFace backend).
token	Optional[Secret]	Secret.from_env_var(['HF_API_TOKEN', 'HF_TOKEN'], strict=False)	The API token to download private models from Hugging Face.

Run Method Parameters

These are the parameters you can configure for the component's run() method. This means you can pass these parameters at query time through the API, in Playground, or when running a job. For details, see Modify Pipeline Parameters at Query Time.

Parameter	Type	Default	Description
documents	List[Document]		Documents to process.
batch_size	int	1	Batch size used for processing the documents.

Was this page helpful?

Basic Information​

Inputs​

Outputs​

Overview​

Usage Example​

Parameters​

Init Parameters​

Run Method Parameters​