Skip to main content

AnswerBuilder

Convert a query and Generator's replies into a GeneratedAnswer object. AnswerBuilder is used as the last component in query pipelines.

Basic Information

  • Type: components.builders.answer_builder.AnswerBuilder
  • Components it can connect with:
    • Generators: AnswerBuilder accepts Generator's replies and converts them into GeneratedAnswer objects.
    • Input: AnswerBuilder receives the user query to add it to the GeneratedAnswer.

Inputs

ParameterTypeDefaultDescription
querystrThe user query.
repliesUnion[List[str], List[ChatMessage]]The output of the Generator. Can be a list of strings or a list of ChatMessage objects.
metaOptional[List[Dict[str, Any]]]NoneThe metadata returned by the Generator. If not specified, the generated answer contains no metadata.
documentsOptional[List[Document]]NoneThe documents used as input for the Generator. If specified, they are added to theGeneratedAnswer objects. If both documents and reference_pattern are specified, the documents referenced in the Generator's output are extracted from the input documents and added to the GeneratedAnswer objects.
patternOptional[str]NoneThe regular expression pattern to extract the answer text from the Generator. If not specified, the entire response is used as the answer. The regular expression can have one capture group at most. If present, the capture group text is used as the answer. If no capture group is present, the whole match is used as the answer. Examples: [^\n]+$ finds "this is an answer" in a string "this is an argument.\nthis is an answer". Answer: (.*) finds "this is an answer" in a string "this is an argument. Answer: this is an answer".
reference_patternOptional[str]NoneThe regular expression pattern used for parsing the document references. If not specified, no parsing is done, and all documents are referenced. References need to be specified as indices of the input documents and start at [1]. Example: \[(\d+)\] finds "1" in a string "this is an answer[1]".

Outputs

ParameterTypeDefaultDescription
answersList[GeneratedAnswer]The answers received from the output of the Generator, may include documents.

Overview

Use AnswerBuilder to parse Generator's replies using custom regular expressions. It can also take documents and metadata from the Generator and add them to the GeneratedAnswer objects. AnswerBuilder works with both Generators and Chat Generators.

To include references in answers, use DeepsetAnswerBuilder. For details on which builder to choose, see Enable references for generated answers.

Usage Example

Initializing the Component

components:
AnswerBuilder:
type: components.builders.answer_builder.AnswerBuilder
init_parameters:

Using the Component in a Pipeline

This is a RAG pipeline with AnswerBuilder. Note that the answers this pipeline generates won't include references.

components:
retriever: # Selects the most similar documents from the document store
type: haystack_integrations.components.retrievers.opensearch.open_search_hybrid_retriever.OpenSearchHybridRetriever
init_parameters:
document_store:
type: haystack_integrations.document_stores.opensearch.document_store.OpenSearchDocumentStore
init_parameters:
hosts:
index: ''
max_chunk_bytes: 104857600
embedding_dim: 768
return_embedding: false
method:
mappings:
settings:
create_index: true
http_auth:
use_ssl:
verify_certs:
timeout:
top_k: 20 # The number of results to return
fuzziness: 0
embedder:
type: deepset_cloud_custom_nodes.embedders.nvidia.text_embedder.DeepsetNvidiaTextEmbedder
init_parameters:
normalize_embeddings: true
model: intfloat/e5-base-v2

ranker:
type: deepset_cloud_custom_nodes.rankers.nvidia.ranker.DeepsetNvidiaRanker
init_parameters:
model: intfloat/simlm-msmarco-reranker
top_k: 8

meta_field_grouping_ranker:
type: haystack.components.rankers.meta_field_grouping_ranker.MetaFieldGroupingRanker
init_parameters:
group_by: file_id
subgroup_by:
sort_docs_by: split_id

prompt_builder:
type: haystack.components.builders.chat_prompt_builder.ChatPromptBuilder
init_parameters:
template: "You are a technical expert.\nYou answer questions truthfully based on provided documents.\nIf the answer exists in several documents, summarize them.\nIgnore documents that don't contain the answer to the question.\nOnly answer based on the documents provided. Don't make things up.\nIf no information related to the question can be found in the document, say so.\nAlways use references in the form [NUMBER OF DOCUMENT] when using information from a document, e.g. [3] for Document [3] .\nNever name the documents, only enter a number in square brackets as a reference.\nThe reference must only refer to the number that comes in square brackets after the document.\nOtherwise, do not use brackets in your answer and reference ONLY the number of the document without mentioning the word document.\n\nThese are the documents:\n{%- if documents|length > 0 %}\n{%- for document in documents %}\nDocument [{{ loop.index }}] :\nName of Source File: {{ document.meta.file_name }}\n{{ document.content }}\n{% endfor -%}\n{%- else %}\nNo relevant documents found.\nRespond with \"Sorry, no matching documents were found, please adjust the filters or try a different question.\"\n{% endif %}\n\nQuestion: {{ question }}\nAnswer:"
required_variables:
variables:

llm:
type: haystack_integrations.components.generators.amazon_bedrock.AmazonBedrockChatGenerator
init_parameters:
model: us.anthropic.claude-sonnet-4-20250514-v1:0
aws_region_name: us-west-2

# Enable extended thinking mode:
# Note that temperature is not supported for extended thinking mode.
thinking:
type: enabled
budget_tokens: 1024 # min budget for Claude 4.0 Sonnet, increase to allow more thinking
max_length: 1674 # includes thinking.budget_tokens
# include_thinking: False # control whether to include thinking output in the reply, defaults to True if unset
# thinking_tag: claudeThinking # set tag to identify thinking output, defaults to "thinking" if unset. If set to null, no tags will be added.

attachments_joiner:
type: haystack.components.joiners.document_joiner.DocumentJoiner
init_parameters:
join_mode: concatenate
weights:
top_k:
sort_by_score: true

multi_file_converter:
type: haystack.core.super_component.super_component.SuperComponent
init_parameters:
input_mapping:
sources:
- file_classifier.sources
is_pipeline_async: false
output_mapping:
score_adder.output: documents
pipeline:
components:
file_classifier:
type: haystack.components.routers.file_type_router.FileTypeRouter
init_parameters:
mime_types:
- text/plain
- application/pdf
- text/markdown
- text/html
- application/vnd.openxmlformats-officedocument.wordprocessingml.document
- application/vnd.openxmlformats-officedocument.presentationml.presentation
- application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
- text/csv

text_converter:
type: haystack.components.converters.txt.TextFileToDocument
init_parameters:
encoding: utf-8

pdf_converter:
type: haystack.components.converters.pdfminer.PDFMinerToDocument
init_parameters:
line_overlap: 0.5
char_margin: 2
line_margin: 0.5
word_margin: 0.1
boxes_flow: 0.5
detect_vertical: true
all_texts: false
store_full_path: false

markdown_converter:
type: haystack.components.converters.txt.TextFileToDocument
init_parameters:
encoding: utf-8

html_converter:
type: haystack.components.converters.html.HTMLToDocument
init_parameters:
# A dictionary of keyword arguments to customize how you want to extract content from your HTML files.
# For the full list of available arguments, see
# the [Trafilatura documentation](https://trafilatura.readthedocs.io/en/latest/corefunctions.html#extract).
extraction_kwargs:
output_format: markdown # Extract text from HTML. You can also also choose "txt"
target_language: # You can define a language (using the ISO 639-1 format) to discard documents that don't match that language.
include_tables: true # If true, includes tables in the output
include_links: true # If true, keeps links along with their targets

docx_converter:
type: haystack.components.converters.docx.DOCXToDocument
init_parameters:
link_format: markdown

pptx_converter:
type: haystack.components.converters.pptx.PPTXToDocument
init_parameters: {}

xlsx_converter:
type: haystack.components.converters.xlsx.XLSXToDocument
init_parameters: {}

csv_converter:
type: haystack.components.converters.csv.CSVToDocument
init_parameters:
encoding: utf-8

splitter:
type: haystack.components.preprocessors.document_splitter.DocumentSplitter
init_parameters:
split_by: word
split_length: 250
split_overlap: 30
respect_sentence_boundary: true
language: en

score_adder:
type: haystack.components.converters.output_adapter.OutputAdapter
init_parameters:
template: |
{%- set scored_documents = [] -%}
{%- for document in documents -%}
{%- set doc_dict = document.to_dict() -%}
{%- set _ = doc_dict.update({'score': 100.0}) -%}
{%- set scored_doc = document.from_dict(doc_dict) -%}
{%- set _ = scored_documents.append(scored_doc) -%}
{%- endfor -%}
{{ scored_documents }}
output_type: List[haystack.Document]
custom_filters:
unsafe: true

text_joiner:
type: haystack.components.joiners.document_joiner.DocumentJoiner
init_parameters:
join_mode: concatenate
sort_by_score: false

tabular_joiner:
type: haystack.components.joiners.document_joiner.DocumentJoiner
init_parameters:
join_mode: concatenate
sort_by_score: false
connections:
- sender: file_classifier.text/plain
receiver: text_converter.sources
- sender: file_classifier.application/pdf
receiver: pdf_converter.sources
- sender: file_classifier.text/markdown
receiver: markdown_converter.sources
- sender: file_classifier.text/html
receiver: html_converter.sources
- sender: file_classifier.application/vnd.openxmlformats-officedocument.wordprocessingml.document
receiver: docx_converter.sources
- sender: file_classifier.application/vnd.openxmlformats-officedocument.presentationml.presentation
receiver: pptx_converter.sources
- sender: file_classifier.application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
receiver: xlsx_converter.sources
- sender: file_classifier.text/csv
receiver: csv_converter.sources
- sender: text_joiner.documents
receiver: splitter.documents
- sender: text_converter.documents
receiver: text_joiner.documents
- sender: pdf_converter.documents
receiver: text_joiner.documents
- sender: markdown_converter.documents
receiver: text_joiner.documents
- sender: html_converter.documents
receiver: text_joiner.documents
- sender: pptx_converter.documents
receiver: text_joiner.documents
- sender: docx_converter.documents
receiver: text_joiner.documents
- sender: xlsx_converter.documents
receiver: tabular_joiner.documents
- sender: csv_converter.documents
receiver: tabular_joiner.documents
- sender: splitter.documents
receiver: tabular_joiner.documents
- sender: tabular_joiner.documents
receiver: score_adder.documents

AnswerBuilder:
type: haystack.components.builders.answer_builder.AnswerBuilder
init_parameters:
pattern:
reference_pattern:
last_message_only: false
return_only_referenced_documents: true

connections: # Defines how the components are connected
- sender: retriever.documents
receiver: ranker.documents
- sender: ranker.documents
receiver: meta_field_grouping_ranker.documents
- sender: prompt_builder.prompt
receiver: llm.messages
- sender: multi_file_converter.documents
receiver: attachments_joiner.documents
- sender: meta_field_grouping_ranker.documents
receiver: attachments_joiner.documents

- sender: attachments_joiner.documents
receiver: prompt_builder.documents
- sender: llm.replies
receiver: AnswerBuilder.replies

inputs: # Define the inputs for your pipeline
query: # These components will receive the query as input
- "retriever.query"
- "ranker.query"
- "prompt_builder.question"
- "AnswerBuilder.query"
filters: # These components will receive a potential query filter as input
- "retriever.filters_bm25"
- "retriever.filters_embedding"

files:
- multi_file_converter.sources

outputs: # Defines the output of your pipeline
documents: "attachments_joiner.documents" # The output of the pipeline is the retrieved documents
answers: "AnswerBuilder.answers" # The output of the pipeline is the generated answers

max_runs_per_component: 100

metadata: {}


Parameters

Init Parameters

These are the parameters you can configure in Pipeline Builder:

ParameterTypeDefaultDescription
patternOptional[str]NoneThe regular expression pattern to extract the answer text from the Generator. If not specified, the entire response is used as the answer. The regular expression can have one capture group at most. If present, the capture group text is used as the answer. If no capture group is present, the whole match is used as the answer. Examples: [^\n]+$ finds "this is an answer" in a string "this is an argument.\nthis is an answer". Answer: (.*) finds "this is an answer" in a string "this is an argument. Answer: this is an answer".
reference_patternOptional[str]NoneThe regular expression pattern used for parsing the document references. If not specified, no parsing is done, and all documents are referenced. References need to be specified as indices of the input documents and start at [1]. Example: \[(\d+)\] finds "1" in a string "this is an answer[1]".
last_message_onlyboolFalseIf False (default value), all messages are used as the answer. If True, only the last message is used as the answer.

Run Method Parameters

These are the parameters you can configure for the component's run() method. This means you can pass these parameters at query time through the API, in Playground, or when running a job. For details, see Modify Pipeline Parameters at Query Time.

ParameterTypeDefaultDescription
querystrThe user query.
repliesUnion[List[str], List[ChatMessage]]The output of the Generator. Can be a list of strings or a list of ChatMessage objects.
metaOptional[List[Dict[str, Any]]]NoneThe metadata returned by the Generator. If not specified, the generated answer will contain no metadata.
documentsOptional[List[Document]]NoneThe documents used as the Generator inputs. If specified, they are added to theGeneratedAnswer objects. If both documents and reference_pattern are specified, the documents referenced in the Generator output are extracted from the input documents and added to the GeneratedAnswer objects.
patternOptional[str]NoneThe regular expression pattern to extract the answer text from the Generator. If not specified, the entire response is used as the answer. The regular expression can have one capture group at most. If present, the capture group text is used as the answer. If no capture group is present, the whole match is used as the answer. Examples: [^\n]+$ finds "this is an answer" in a string "this is an argument.\nthis is an answer". Answer: (.*) finds "this is an answer" in a string "this is an argument. Answer: this is an answer".
reference_patternOptional[str]NoneThe regular expression pattern used for parsing the document references. If not specified, no parsing is done, and all documents are referenced. References need to be specified as indices of the input documents and start at [1]. Example: \[(\d+)\] finds "1" in a string "this is an answer[1]".