ExtractiveReader Parameters

Learn how to customize TransformersSimiliartyRanker.

YAML Init Parameters

These are the parameters you can pass to this component in the pipeline YAML configuration:

ParameterTypePossible valuesDescription
modelUnion[Path, String]Default: "deepset/roberta-base-squad2-distilled"A Hugging Face transformers question answering model. Can either be a path to a folder containing the model files or an identifier of a model from Hugging Face.
Required.
deviceComponentDeviceDefault: NoneThe device on which the model is loaded. If None, the default device is automatically selected.
Optional.
tokenSecretDefault: Secret.from_env_var("HF_API_TOKEN", strict=False)The API token used to download private models from Hugging Face.
Optional.
top_kIntegerDefault: 20Number of answers to return per query. It is required even if score_threshold is set. An additional answer with no text is returned if no_answer is set to True (default).
Required.
score_thresholdFloatDefault: NoneReturns only answers with a probability score above this threshold. ExtractiveReader assigns a probability score to answers. This score ranges from 0 to 1. It indicates how well the answers match the query. A probability score close to 1 means the model has high confidence in the answer's relevance. Answers with the highest probability are listed first.
Optional.
max_seq_lengthIntegerDefault: 384Maximum number of tokens of one input text for the model. If a sequence exceeds it, it's is split.
Required.
strideIntegerDefault: 128Number of tokens that overlap when a sequence is split because it exceeds max_seq_length.
Required.
max_batch_sizeIntegerDefault: NoneMaximum number of samples that are fed through the model at the same time.
Optional.
answers_per_seqIntegerDefault: NoneNumber of answer candidates to consider per sequence. This is relevant when a document was split into multiple sequences because of max_seq_length.
Optional.
no_answerBooleanTrue, False
Default: True
Whether to return an additional no answer with an empty text and a score representing the probability that the other top_k answers are incorrect. For example, if top_k: 4, the system returns four answers and an additional empty answer. Each returned answer has a probability score assigned. If the empty answer has a probability of 0.5, it means that's the probability that none of the returned answers is correct.
Required.
calibration_factorFloatDefault: 0.1A factor used for calibrating probabilities.
Required.
overlap_thresholdFloatDefault: 0.01If set, removes duplicate answers if they have an overlap larger than the supplied threshold. If None, then all answers are kept.
Optional.
model_kwargsDictionary of string and anyDefault: NoneAdditional keyword arguments passed to AutoModelForQuestionAnswering.from_pretrained when loading the model specified in model. For details on what kwargs you can pass, see the model's documentation.
Optional.

REST API Runtime Parameters

There are no runtime parameters you can pass to this component when making a request to the Search REST API endpoint.