NamedEntityExtractor
Annotates named entities in a collection of documents.
Basic Information
- Type:
haystack_integrations.extractors.named_entity_extractor.NamedEntityExtractor
Inputs
| Parameter | Type | Default | Description |
|---|---|---|---|
| documents | List[Document] | Documents to process. | |
| batch_size | int | 1 | Batch size used for processing the documents. |
Outputs
| Parameter | Type | Default | Description |
|---|---|---|---|
| documents | List[Document] | Processed documents. |
Overview
Bear with us while we're working on adding pipeline examples and most common components connections.
Annotates named entities in a collection of documents.
The component supports two backends: Hugging Face and spaCy. The former can be used with any sequence classification model from the Hugging Face model hub, while the latter can be used with any spaCy model that contains an NER component. Annotations are stored as metadata in the documents.
Usage example:
from haystack import Document
from haystack.components.extractors.named_entity_extractor import NamedEntityExtractor
documents = [
Document(content="I'm Merlin, the happy pig!"),
Document(content="My name is Clara and I live in Berkeley, California."),
]
extractor = NamedEntityExtractor(backend="hugging_face", model="dslim/bert-base-NER")
extractor.warm_up()
results = extractor.run(documents=documents)["documents"]
annotations = [NamedEntityExtractor.get_stored_annotations(doc) for doc in results]
print(annotations)
Usage Example
components:
NamedEntityExtractor:
type: components.extractors.named_entity_extractor.NamedEntityExtractor
init_parameters:
Parameters
Init Parameters
These are the parameters you can configure in Pipeline Builder:
| Parameter | Type | Default | Description |
|---|---|---|---|
| backend | Union[str, NamedEntityExtractorBackend] | Backend to use for NER. | |
| model | str | Name of the model or a path to the model on the local disk. Dependent on the backend. | |
| pipeline_kwargs | Optional[Dict[str, Any]] | None | Keyword arguments passed to the pipeline. The pipeline can override these arguments. Dependent on the backend. |
| device | Optional[ComponentDevice] | None | The device on which the model is loaded. If None, the default device is automatically selected. If a device/device map is specified in pipeline_kwargs, it overrides this parameter (only applicable to the HuggingFace backend). |
| token | Optional[Secret] | Secret.from_env_var(['HF_API_TOKEN', 'HF_TOKEN'], strict=False) | The API token to download private models from Hugging Face. |
Run Method Parameters
These are the parameters you can configure for the component's run() method. This means you can pass these parameters at query time through the API, in Playground, or when running a job. For details, see Modify Pipeline Parameters at Query Time.
| Parameter | Type | Default | Description |
|---|---|---|---|
| documents | List[Document] | Documents to process. | |
| batch_size | int | 1 | Batch size used for processing the documents. |
Was this page helpful?