ExtractiveReader Parameters

Learn how to customize TransformersSimiliartyRanker.

YAML Init Parameters

These are the parameters you can pass to this component in the pipeline YAML configuration:

Parameter

Type

Possible values

Description

model

Unionk:parameters]

Default: "deepset/roberta-base-squad2-distilled"

A Hugging Face transformers question answering model. Can either be a path to a folder containing the model files or an identifier of a model from Hugging Face.
Required.

device

ComponentDevice

Default: None

The device on which the model is loaded. If None, the default device is automatically selected.
Optional.

token

Secret

Default: Secret.from_env_var("HF_API_TOKEN", strict=False)

The API token used to download private models from Hugging Face.
Optional.

top_k

Integer

Default: 20

Number of answers to return per query. It is required even if score_threshold is set. An additional answer with no text is returned if no_answer is set to True (default).
Required.

score_threshold

Float

Default: None

Returns only answers with a probability score above this threshold. ExtractiveReader assigns a probability score to answers. This score ranges from 0 to 1. It indicates how well the answers match the query. A probability score close to 1 means the model has high confidence in the answer's relevance. Answers with the highest probability are listed first.
Optional.

max_seq_length

Integer

Default: 384

Maximum number of tokens of one input text for the model. If a sequence exceeds it, it's is split.
Required.

stride

Integer

Default: 128

Number of tokens that overlap when a sequence is split because it exceeds max_seq_length.
Required.

max_batch_size

Integer

Default: None

Maximum number of samples that are fed through the model at the same time.
Optional.

answers_per_seq

Integer

Default: None

Number of answer candidates to consider per sequence. This is relevant when a document was split into multiple sequences because of max_seq_length.
Optional.

no_answer

Boolean

True, False
Default: True

Whether to return an additional no answer with an empty text and a score representing the probability that the other top_k answers are incorrect. For example, if top_k: 4, the system returns four answers and an additional empty answer. Each returned answer has a probability score assigned. If the empty answer has a probability of 0.5, it means that's the probability that none of the returned answers is correct.
Required.

calibration_factor

Float

Default: 0.1

A factor used for calibrating probabilities.
Required.

overlap_threshold

Float

Default: 0.01

If set, removes duplicate answers if they have an overlap larger than the supplied threshold. If None, then all answers are kept.
Optional.

model_kwargs

Dictionary of string and any

Default: None

Additional keyword arguments passed to AutoModelForQuestionAnswering.from_pretrained when loading the model specified in model. For details on what kwargs you can pass, see the model's documentation.
Optional.


REST API Runtime Parameters

There are no runtime parameters you can pass to this component when making a request to the Search REST API endpoint.