Learn how to customize TransformersSimiliartyRanker.
YAML Init Parameters
These are the parameters you can pass to this component in the pipeline YAML configuration:
Parameter | Type | Possible values | Description |
---|---|---|---|
model | Union[Path, String] | Default: "deepset/roberta-base-squad2-distilled" | A Hugging Face transformers question answering model. Can either be a path to a folder containing the model files or an identifier of a model from Hugging Face. Required. |
device | ComponentDevice | Default: None | The device on which the model is loaded. If None , the default device is automatically selected.Optional. |
token | Secret | Default: Secret.from_env_var("HF_API_TOKEN", strict=False) | The API token used to download private models from Hugging Face. Optional. |
top_k | Integer | Default: 20 | Number of answers to return per query. It is required even if score_threshold is set. An additional answer with no text is returned if no_answer is set to True (default).Required. |
score_threshold | Float | Default: None | Returns only answers with a probability score above this threshold. ExtractiveReader assigns a probability score to answers. This score ranges from 0 to 1. It indicates how well the answers match the query. A probability score close to 1 means the model has high confidence in the answer's relevance. Answers with the highest probability are listed first. Optional. |
max_seq_length | Integer | Default: 384 | Maximum number of tokens of one input text for the model. If a sequence exceeds it, it's is split. Required. |
stride | Integer | Default: 128 | Number of tokens that overlap when a sequence is split because it exceeds max_seq_length .Required. |
max_batch_size | Integer | Default: None | Maximum number of samples that are fed through the model at the same time. Optional. |
answers_per_seq | Integer | Default: None | Number of answer candidates to consider per sequence. This is relevant when a document was split into multiple sequences because of max_seq_length .Optional. |
no_answer | Boolean | True , False Default: True | Whether to return an additional no answer with an empty text and a score representing the probability that the other top_k answers are incorrect. For example, if top_k: 4 , the system returns four answers and an additional empty answer. Each returned answer has a probability score assigned. If the empty answer has a probability of 0.5, it means that's the probability that none of the returned answers is correct.Required. |
calibration_factor | Float | Default: 0.1 | A factor used for calibrating probabilities. Required. |
overlap_threshold | Float | Default: 0.01 | If set, removes duplicate answers if they have an overlap larger than the supplied threshold. If None , then all answers are kept.Optional. |
model_kwargs | Dictionary of string and any | Default: None | Additional keyword arguments passed to AutoModelForQuestionAnswering.from_pretrained when loading the model specified in model . For details on what kwargs you can pass, see the model's documentation.Optional. |
REST API Runtime Parameters
There are no runtime parameters you can pass to this component when making a request to the Search REST API endpoint.