Skip to main content
For the complete documentation index for agents and LLMs, see llms.txt.

ExtractiveReader

ExtractiveReader locates and extracts exact answer spans from a collection of documents in response to a query. Unlike implementations that normalize scores per document, it scores each answer span independently, making scores directly comparable across all documents.

Key Features

  • Performs extractive question answering using Hugging Face transformer models.
  • Scores answer spans independently across all documents for direct comparison.
  • Returns a configurable number of top answers with confidence scores.
  • Filters answers using a score threshold to return only high-confidence results.
  • Handles long documents by splitting them into overlapping sequences and deduplicating answers.
  • Optionally returns a "no answer" score to indicate when top answers may be incorrect.

Configuration

  1. Drag the ExtractiveReader component onto the canvas from the Component Library.
  2. Click the component to open the configuration panel.
  3. On the General tab:
    1. Select the model: enter a Hugging Face model identifier or a local path. The default is deepset/roberta-base-squad2-distilled.
  4. Go to the Advanced tab to configure the device, API token, top_k, score threshold, maximum sequence length, stride, batch size, answers per sequence, no-answer scoring, calibration factor, overlap threshold, and model keyword arguments.

Connections

ExtractiveReader accepts a query string and a list of documents as inputs. It outputs a list of answers sorted by descending confidence score.

Typically, you connect a document retriever (such as InMemoryBM25Retriever or OpenSearchBM25Retriever) to the documents input and pass the query from your pipeline's input to the query input.

Usage Example

Work in Progress

We're working on adding pipeline examples and the most common component connections.

components:
ExtractiveReader:
type: components.readers.extractive.ExtractiveReader
init_parameters:

Example usage in Python:

from haystack import Document
from haystack.components.readers import ExtractiveReader

docs = [
Document(content="Python is a popular programming language"),
Document(content="python ist eine beliebte Programmiersprache"),
]

reader = ExtractiveReader()
reader.warm_up()

question = "What is a popular programming language?"
result = reader.run(query=question, documents=docs)
assert "Python" in result["answers"][0].data

Parameters

Inputs

ParameterTypeDefaultDescription
querystrQuery string.
documentsList[Document]List of Documents in which you want to search for an answer to the query.
top_kOptional[int]NoneThe maximum number of answers to return. An additional answer is returned if no_answer is set to True (default).
score_thresholdOptional[float]NoneReturns only answers with the score above this threshold.
max_seq_lengthOptional[int]NoneMaximum number of tokens. If a sequence exceeds it, the sequence is split.
strideOptional[int]NoneNumber of tokens that overlap when sequence is split because it exceeds max_seq_length.
max_batch_sizeOptional[int]NoneMaximum number of samples that are fed through the model at the same time.
answers_per_seqOptional[int]NoneNumber of answer candidates to consider per sequence. This is relevant when a Document was split into multiple sequences because of max_seq_length.
no_answerOptional[bool]NoneWhether to return no answer scores.
overlap_thresholdOptional[float]NoneIf set this will remove duplicate answers if they have an overlap larger than the supplied threshold. For example, for the answers "in the river in Maine" and "the river" we would remove one of these answers since the second answer has a 100% (1.0) overlap with the first answer. However, for the answers "the river in" and "in Maine" there is only a max overlap percentage of 25% so both of these answers could be kept if this variable is set to 0.24 or lower. If None is provided then all answers are kept.

Outputs

ParameterTypeDefaultDescription
answersList[ExtractedAnswer]List of answers sorted by (desc.) answer score.

Init Parameters

These are the parameters you can configure in Pipeline Builder:

ParameterTypeDefaultDescription
modelUnion[Path, str]deepset/roberta-base-squad2-distilledA Hugging Face transformers question answering model. Can either be a path to a folder containing the model files or an identifier for the Hugging Face hub.
deviceOptional[ComponentDevice]NoneThe device on which the model is loaded. If None, the default device is automatically selected.
tokenOptional[Secret]Secret.from_env_var(['HF_API_TOKEN', 'HF_TOKEN'], strict=False)The API token used to download private models from Hugging Face.
top_kint20Number of answers to return per query. It is required even if score_threshold is set. An additional answer with no text is returned if no_answer is set to True (default).
score_thresholdOptional[float]NoneReturns only answers with the probability score above this threshold.
max_seq_lengthint384Maximum number of tokens. If a sequence exceeds it, the sequence is split.
strideint128Number of tokens that overlap when sequence is split because it exceeds max_seq_length.
max_batch_sizeOptional[int]NoneMaximum number of samples that are fed through the model at the same time.
answers_per_seqOptional[int]NoneNumber of answer candidates to consider per sequence. This is relevant when a Document was split into multiple sequences because of max_seq_length.
no_answerboolTrueWhether to return an additional no answer with an empty text and a score representing the probability that the other top_k answers are incorrect.
calibration_factorfloat0.1Factor used for calibrating probabilities.
overlap_thresholdOptional[float]0.01If set this will remove duplicate answers if they have an overlap larger than the supplied threshold. For example, for the answers "in the river in Maine" and "the river" we would remove one of these answers since the second answer has a 100% (1.0) overlap with the first answer. However, for the answers "the river in" and "in Maine" there is only a max overlap percentage of 25% so both of these answers could be kept if this variable is set to 0.24 or lower. If None is provided then all answers are kept.
model_kwargsOptional[Dict[str, Any]]NoneAdditional keyword arguments passed to AutoModelForQuestionAnswering.from_pretrained when loading the model specified in model. For details on what kwargs you can pass, see the model's documentation.

Run Method Parameters

These are the parameters you can configure for the component's run() method. This means you can pass these parameters at query time through the API, in Playground, or when running a job. For details, see Modify Pipeline Parameters at Query Time.

ParameterTypeDefaultDescription
querystrQuery string.
documentsList[Document]List of Documents in which you want to search for an answer to the query.
top_kOptional[int]NoneThe maximum number of answers to return. An additional answer is returned if no_answer is set to True (default).
score_thresholdOptional[float]NoneReturns only answers with the score above this threshold.
max_seq_lengthOptional[int]NoneMaximum number of tokens. If a sequence exceeds it, the sequence is split.
strideOptional[int]NoneNumber of tokens that overlap when sequence is split because it exceeds max_seq_length.
max_batch_sizeOptional[int]NoneMaximum number of samples that are fed through the model at the same time.
answers_per_seqOptional[int]NoneNumber of answer candidates to consider per sequence. This is relevant when a Document was split into multiple sequences because of max_seq_length.
no_answerOptional[bool]NoneWhether to return no answer scores.
overlap_thresholdOptional[float]NoneIf set this will remove duplicate answers if they have an overlap larger than the supplied threshold. For example, for the answers "in the river in Maine" and "the river" we would remove one of these answers since the second answer has a 100% (1.0) overlap with the first answer. However, for the answers "the river in" and "in Maine" there is only a max overlap percentage of 25% so both of these answers could be kept if this variable is set to 0.24 or lower. If None is provided then all answers are kept.