Reader

Reader is the core component that fetches the right answers to you. There are several types of readers that you can use for your search system.

deepset Cloud uses readers that are:

  • Built on the latest transformer-based language models
  • Strong in their grasp of semantics
  • Sensitive to syntactic structure
  • State-of-the-art in question-answering (QA) tasks like SQuAD and Natural Questions

Our readers contain all the components of end-to-end, open-domain QA systems, including:

  • Loading of model weights
  • Tokenization
  • Embedding computation
  • Span prediction
  • Candidate aggregation

If you use a reader in your pipeline, it highlights phrases and sentences as answers to your query.

Usage

Readers are usually combined with retrievers in pipelines. To define a reader:

#Import the reader:
from haystack.nodes import FARMReader

#Specify the model that you want to use with your reader:
model = "deepset/roberta-base-squad2"

#Specify the reader:
reader = FARMReader(model, use_gpu=True)

Or in YAML:

components:
    -name: MyReader
  type: FARMReader
  params:
    model: "deepset/roberta-base-squad2"
    use_gpu: True

Models

A reader always takes a model as an argument. deepset Cloud readers handle loading model weights so to use a pre-trained QA model with your reader, simply provide its Hugging Face model hub name.

There are plenty of models out there and it can be difficult to select one. Here are some tips to help you get started:

  • RoBERTa (base): An optimized variant of BERT and a great starting point. Can be handled by any machine with a single NVidia V100 GPU.
    • PRO: Strong all-round model
    • CON: There are faster and more accurate models
    • HUB NAME: deepset/roberta-base-squad1
  • MiniLM: A cleverly distilled model that sacrifices some accuracy for speed. Recommended if you prioritize speed and GPU memory over accuracy. Outperforms BERT base on SQuAD.
    • PRO: Inference speed up to 50% faster than BERT base
    • CON: Doesn't match the best base-sized models in accuracy.
    • HUB NAME: deepset/minilm-uncased-squad2
  • ALBERT (XXL): A large and powerful SotA model. If you want the best performance and you have the computational resources, this is the model for you.
    • PRO: Better accuracy than any other open-source model in QA.
    • CON: Needs a lot of computational power which makes it impractical in most use cases.
    • HUB NAME: ahotrod/albert_xxlargev1_squad2_512

📘

Models for TableReader

The nq-reader models used with TableReader can provide confidence scores but cannot handle questions that need aggregation over multiple cells. The answers are sorted by a general table score first, and then by answer span scores.

If you want to learn more about models, see Language Models.

Reader Types

TableReader

This reader retrieves answers to your questions even if they are buried in a table. It is designed to use the TAPAS model by Google (google/tapas-base-finetuned-wtq). This model can return a single cell as an answer or can pick a set of cells and then aggregate them to get the final answer. It uses the Hugging Face transformers framework.
For a full list of models available for this reader, see Hugging Face Models.

Usage

These are the arguments that you can specify for TableReader:

ArgumentTypePossible ValuesDescription
model_name_or_pathStringPath to a saved model or the name of a public model.Mandatory. Specifies the model that the reader should use.
For a list of available models, see Hugging Face Models.
model_versionStringTag name, branch name, or commit hashSpecifies the version of the model from the Hugging Face model hub.
tokenizerStringSpecifies the name of the tokenizer. Usually the same as the model.
use_gpuBooleanTrue/FalseUses GPU. Falls back on CPU if GPU is unavailable.
top_kIntegerSpecifies the number of answers to return.
top_k_per_candidateIntegerSpecifies the number of answers to extract for each candidate table coming from the retriever.
return_no_answerBooleanTrue/FalseIncludes no_answer prediction in the results. Only applicable with nq-reader models.
max_seq_lenIntegerSpecifies the maximum sequence length of one input table for the model. If the number of tokens of the query and the table exceedsmax_seq_len, the table is truncated by removing rows until the input size fits the model.

FARMReader

A transformer-based model for extractive QA using the FARM framework. You can only use encoder models with FARM reader, such as BERT, ELECTRA, RoBERTa, ALBERT, XML, DistilBERT, DeBERTa.

Main Features

  • Removes duplicates
  • Uses the tokenizers from the Hugging Face transformers library
  • Start and end logits are summed and not normalized

Usage

These are the arguments that you can specify for FARMReader:

ArgumentTypePossible ValuesDescription
model_name_or_pathStringPath to a saved model or the name of a public model.
Example: deepset/bert-base-cased-squad2
Mandatory. Specifies the model that the reader should use.
For a list of available models, see Hugging Face Models .
model_versionStringTag name, branch name, or commit hashSpecifies the version of the model from the Hugging Face model hub.
context_window_sizeIntegerThe number of charactersSpecifies the size of the window that defines how many of the surrounding characters are considered as the context of an answer text. Used when displaying the context around the answer.
batch_sizeIntegerSpecifies the number of samples that the model receives in one batch for inference. Memory consumption is lower in inference mode so we recommend that you use a single batch.
use_gpuBooleanTrue/FalseUses GPU if available.
no_ans_boostFloat0 (default)
negative number
positive number
Specifies how much the no_answer logit is increased. If set to 0, it is unchanged. If set to a negative number, there's a lower chance of no_answer being predicted. If set to a positive number, there is an increased chance of no_answer.
return_no_answerBooleanTrue/FalseIncludes no_answer predictions in the results.
top_kIntegerSpecifies the maximum number of answers to return.
top_k_per_candidateIntegerSpecifies the number of answers to extract for each candidate document coming from the retriever.
This is not the number of final answers that you receive (see top_k). FARM includes no_answer in the sorted list of predictions.
top_k_per_sampleIntegerSpecifies the number of answers to extract from each small text passage that the model can process at once. You usually want a small value here as it slows down inference.
num_processesInteger0
None
Specifies the number of processes for multiprocessing.Pool. When set to 0, disables multiprocessing. When set to None, the inferencer determines the optimum number of processes.
To debug the language model, you may need to disable multiprocessing.
max_seq_lenIntegerSpecifies the maximum sequence length of one input text for the model.
doc_strideIntegerSpecifies the length of the striding window for splitting long texts (used if len(text) > max_seq_len).
progress_barBooleanTrue/FalseShows a tqdm progress bar. You may want to disable it in production deployments to keep the logs clean.
duplicate_filteringIntegerSpecifies how to handle duplicates.
Answers are filtered based on their position. Both start and end positions of the answers are considered. The higher the value, the answers that are more apart are filtered out. 0 corresponds to exact duplicates. -1 turns off duplicate removal.
use_confidence_scoresBooleanTrue/FalseSets the type of score that is returned with every predicted answer. If set to True, a scaled confidence score between [0, 1] is returned.
If set to False, an unscaled, raw score [-inf, +inf], which is the sum of start and end logit from the model for the predicted span, is returned.
proxiesDictionaryA dictionary of proxy servers.
Example: `{'http': 'some.proxy:1234', 'http://hostname

': 'my.proxy:3111'}`
Specifies a dictionary of proxy servers to use for downloading external models.
local_files_onlyBooleanTrue/FalseForces checking for local files only and forbids downloads.
force_downloadBooleanTrue/FalseForces a download even if the model exists locally in the cache.
use_auth_tokenBooleanTrue/FalseSpecifies the API token used to download private models from Hugging Face.
If set to True, the local token is used. You must create it using transformer-cli login. For more information, see Hugging Face.

TransformersReader

An alternative to the FARMReader that directly uses the Transformers library. It has less features than the FARMReader and so we only recommend it if you would like to bypass FARM. For a comparison of the two Readers, see FARM vs Transformers.

Usage

These are the arguments that you can specify for TransformersReader:

ArgumentTypePossible ValuesDescription
model_name_or_pathStringPath to a saved model or the name of a public model.
Example: deepset/bert-base-cased-squad2
Mandatory. Specifies the model that the reader should use.
For a list of available models, see Hugging Face Models .
model_versionStringTag name, branch name, or commit hashSpecifies the version of the model from the Hugging Face model hub.
tokenizerStringName of the tokenizer (usually the same as model)
context_window_sizeIntegerThe number of charactersSpecifies the size of the window that defines how many of the surrounding characters are considered as the context of an answer text. Used when displaying the context around the answer.
batch_sizeIntegerSpecifies the number of samples that the model receives in one batch for inference. Memory consumption is lower in inference mode so we recommend that you use a single batch.
use_gpuBooleanTrue/FalseUses GPU if available.
return_no_answerBooleanTrue/FalseIncludes no_answer predictions in the results.
top_kIntegerSpecifies the maximum number of answers to return.
top_k_per_candidateIntegerSpecifies the number of answers to extract for each candidate document coming from the retriever.
This is not the number of final answers that you receive (see top_k). FARM includes no_answer in the sorted list of predictions.
max_seq_lenIntegerSpecifies the maximum sequence length of one input text for the model.
doc_strideIntegerSpecifies the length of the striding window for splitting long texts (used if len(text) > max_seq_len).