Skip to main content

Enable References for Generated Answers

Enhance your AI-generated answers with source references to make your app more trustworthy and verifiable.


About This Task

When using an LLM to generate answers, making it cite its resources can significantly enhance the credibility and traceability of the information provided. There are two ways to accomplish this:

  1. Using the ReferencePredictor component.
  2. Instructing the LLM to generate references.

The ReferencePredictor component excels at avoiding incorrect or hallucinated references, making it a reliable choice for maintaining accuracy. On the other hand, LLM-generated references perform better when sentences draw information from multiple sources, offering more flexibility in complex scenarios.

You can experiment with both approaches to find the one that's best for your use case.

Pipeline Templates

Reference functionality is already included in RAG pipeline templates in deepset AI Platform. Depending on the model, it's added either through ReferencePredictor or LLM-generated references.

Adding References with ReferencePredictor

ReferencePredictor

ReferencePredictor shows references to documents on which the LLM's answer is based. Pipelines that contain ReferencePredictor return answers with references next to them. You can easily view the reference to check if the answer is based on it and ensure the model didn't hallucinate.

This is what the references look like in the interface:

An answer on the Search page with references next to each sentence.

To learn more about this component, see ReferencePredictor.

ReferencePredictor is typically used in query pipelines after DeepsetAnswerBuilder. It works with the answers the DeepsetAnswerBuilder produces, which must include documents to generate references.

To ensure this, make sure that DeepsetAnswerBuilder receives documents from an earlier component in the pipeline, such as a Ranker or a Retriever. DeepsetAnswerBuilder then attaches these documents to the answers it passes to ReferencePredictor.

To add ReferencePredictor to your pipeline:

  1. Add ReferencePredictor to the components section of the query pipeline:
    components:
    ...
    reference_predictor: # Finds and displays references to the generated answers from the retrieved documents
    type: deepset_cloud_custom_nodes.augmenters.reference_predictor.ReferencePredictor
    init_parameters:
    model: "cross-encoder/ms-marco-MiniLM-L-6-v2" # The default model used to compare similarity of sentences between the answer and source documents
    verifiability_model: "tstadel/answer-classification-setfit-v2-binary" # A default model for verifying if the generated answers need verification, it rejects noise and out-of-context answers. It works for English only. Set to `null` for other lanugages.
    use_split_rules: True # Keep this parameter setting for better splitting
    extend_abbreviations: True # Extends abbreviations handled with a curated list
    language: en # The language of the data for which you want to generate the references, defaults to en
    model_kwargs: # Specifies additional model settings
    torch_dtype: torch.float16
  • The default model used to compare the similarity of sentences between the answer and source documents is cross-encoder/ms-marco-MiniLM-L-6-v2. You can change it to another model by providing a path to a local model or the model identifier from Hugging Face.
  • The verifiability model checks if the generated answers need verification. By default, it uses a model we trained to ignore the noise, out-of-context answers, or no answer found. Our model works on English data only, so if your data is in other languages, set verifiability_mode_name_or_path: nulll.
  • Setting use_split_rules to True ensures answers are split correctly.
  • Setting extend_abbreviations to True extends the number of abbreviations Punkt tokenizer detects to ensure better sentence splitting.
  • By default, the language of the data is set to English.
    For detailed explanations of parameters, see ReferencePredictor Parameters.
  1. In the connections section, configure the connections:
    1. Make sure DeepsetAnswerBuilder receives documents and answers from the preceding components.
    2. Connect DeepsetAnswerBuilder answers to ReferencePredictor.
      This is what it should look like
      ...
      connections:
      ...
      - sender: ranker.documents # This can be any component that can send documents to Answerbuilder
      receiver: answer_builder.documents
      - sender: llm.replies
      receiver: answer_builder.replies # Answer builder takes the generated replies and turns them into answers
      - sender: answer_builder.answers
      receiver: reference_predictor.answers
  2. Make sure you specify ReferencePredictor's output as the final output of the pipeline:
    outputs:
    answers: reference_predictor.answers

Here's an example of a query pipeline that uses ReferencePredictor:

components:
bm25_retriever: # Selects the most similar documents from the document store
type: haystack_integrations.components.retrievers.opensearch.bm25_retriever.OpenSearchBM25Retriever
init_parameters:
document_store:
init_parameters:
use_ssl: True
verify_certs: False
hosts:
- ${OPENSEARCH_HOST}
http_auth:
- "${OPENSEARCH_USER}"
- "${OPENSEARCH_PASSWORD}"
type: haystack_integrations.document_stores.opensearch.document_store.OpenSearchDocumentStore
top_k: 20 # The number of results to return

query_embedder:
type: haystack.components.embedders.sentence_transformers_text_embedder.SentenceTransformersTextEmbedder
init_parameters:
model: "intfloat/e5-base-v2"
device: null

embedding_retriever: # Selects the most similar documents from the document store
type: haystack_integrations.components.retrievers.opensearch.embedding_retriever.OpenSearchEmbeddingRetriever
init_parameters:
document_store:
init_parameters:
use_ssl: True
verify_certs: False
http_auth:
- "${OPENSEARCH_USER}"
- "${OPENSEARCH_PASSWORD}"
type: haystack_integrations.document_stores.opensearch.document_store.OpenSearchDocumentStore
top_k: 20 # The number of results to return

document_joiner:
type: haystack.components.joiners.document_joiner.DocumentJoiner
init_parameters:
join_mode: concatenate

ranker:
type: haystack.components.rankers.transformers_similarity.TransformersSimilarityRanker
init_parameters:
model: "intfloat/simlm-msmarco-reranker"
top_k: 8
device: null
model_kwargs:
torch_dtype: "torch.float16"

prompt_builder:
type: haystack.components.builders.prompt_builder.PromptBuilder
init_parameters:
template: |-
You are a technical expert.
You answer questions truthfully based on provided documents.
If the answer exists in several documents, summarize them.
Ignore documents that don't contain the answer to the question.
Only answer based on the documents provided. Don't make things up.
If no information related to the question can be found in the document, say so.
These are the documents:
{% for document in documents %}
Document[{{ loop.index }}]:
{{ document.content }}
{% endfor %}
Question: {{question}}
Answer:

llm:
type: haystack.components.generators.openai.OpenAIGenerator
init_parameters:
api_key: {"type": "env_var", "env_vars": ["OPENAI_API_KEY"], "strict": False}
model: "gpt-3.5-turbo"
generation_kwargs:
max_tokens: 400
temperature: 0.0
seed: 0

answer_builder:
init_parameters: {}
type: deepset_cloud_custom_nodes.augmenters.deepset_answer_builder.DeepsetAnswerBuilder

reference_predictor: # Finds and displays references to the generated answers from the retrieved documents
type: deepset_cloud_custom_nodes.augmenters.reference_predictor.ReferencePredictor
init_parameters:
model: "cross-encoder/ms-marco-MiniLM-L-6-v2"
verifiability_model: "tstadel/answer-classification-setfit-v2-binary"
use_split_rules: True # Uses additional rules for better splitting
extend_abbreviations: True # Extends abbreviations handled with a curated list
language: en
model_kwargs: # Specifies additional model settings
torch_dtype: torch.float16

connections: # Defines how the components are connected
- sender: bm25_retriever.documents
receiver: document_joiner.documents
- sender: query_embedder.embedding
receiver: embedding_retriever.query_embedding
- sender: embedding_retriever.documents
receiver: document_joiner.documents
- sender: document_joiner.documents
receiver: ranker.documents
- sender: ranker.documents
receiver: prompt_builder.documents
- sender: ranker.documents
receiver: answer_builder.documents
- sender: prompt_builder.prompt
receiver: llm.prompt
- sender: llm.replies
receiver: answer_builder.replies
- sender: answer_builder.answers
receiver: reference_predictor.answers

max_loops_allowed: 100

inputs: # Define the inputs for your pipeline
query: # These components will receive the query as input
- "bm25_retriever.query"
- "query_embedder.text"
- "ranker.query"
- "prompt_builder.question"
- "answer_builder.query"

filters: # These components will receive a potential query filter as input
- "bm25_retriever.filters"
- "embedding_retriever.filters"

outputs: # Defines the output of your pipeline
documents: "ranker.documents" # The output of the pipeline is the retrieved documents
answers: "reference_predictor.answers" # The output of the pipeline is the generated answers

Adding References with an LLM

To add references using an LLM, you simply need to include specific instructions in your prompt. This method allows the language model to generate references based on its training and the query context.

Here is a prompt we've tested and recommend for you to use:

 You are a technical expert.
You answer the questions truthfully on the basis of the documents provided.
For each document, check whether it is related to the question.
To answer the question, only use documents that are related to the question.
Ignore documents that do not relate to the question.
If the answer is contained in several documents, summarize them.
Always use references in the form [NUMBER OF DOCUMENT] if you use information from a document, e.g. [3] for document [3].
Never name the documents, only enter a number in square brackets as a reference.
The reference may only refer to the number in square brackets after the passage.
Otherwise, do not use brackets in your answer and give ONLY the number of the document without mentioning the word document.
Give a precise, accurate and structured answer without repeating the question.
Answer only on the basis of the documents provided. Do not make up facts.
If the documents cannot answer the question or you are not sure, say so.
These are the documents:
{% for document in documents %}
Document[{{ loop.index }}]:
{{ document.content }}
{% endfor %}
Question: {{question}}
Answer:

The Generator using the prompt must be connected with DeepsetAnswerBuilder, whose reference_pattern is set to acm for the references to be displayed properly.

Here are the detailed steps to add references with an LLM:

  1. Add the prompt with instructions to generate references to PromptBuilder:
    components:
    ...
    prompt_builder:
    type: haystack.components.builders.prompt_builder.PromptBuilder
    init_parameters:
    template: |-
    <s>[INST] You are a technical expert.
    You answer the questions truthfully on the basis of the documents provided.
    For each document, check whether it is related to the question.
    To answer the question, only use documents that are related to the question.
    Ignore documents that do not relate to the question.
    If the answer is contained in several documents, summarize them.
    Always use references in the form [NUMBER OF DOCUMENT] if you use information from a document, e.g. [3] for document [3].
    Never name the documents, only enter a number in square brackets as a reference.
    The reference may only refer to the number in square brackets after the passage.
    Otherwise, do not use brackets in your answer and give ONLY the number of the document without mentioning the word document.
    Give a precise, accurate and structured answer without repeating the question.
    Answer only on the basis of the documents provided. Do not make up facts.
    If the documents cannot answer the question or you are not sure, say so.
    These are the documents:
    {% for document in documents %}
    Document[{{ loop.index }}]:
    {{ document.content }}
    {% endfor %}
    Question: {{question}}
    Answer:
    [/INST]
    ...

    # Here you would also configure other components
  2. Add DeepsetAnswerBuilder and configure its reference_pattern parameter:
    components:
    ...
    prompt_builder:
    type: haystack.components.builders.prompt_builder.PromptBuilder
    init_parameters:
    template: |-
    <s>[INST] You are a technical expert.
    You answer the questions truthfully on the basis of the documents provided.
    For each document, check whether it is related to the question.
    To answer the question, only use documents that are related to the question.
    Ignore documents that do not relate to the question.
    If the answer is contained in several documents, summarize them.
    Always use references in the form [NUMBER OF DOCUMENT] if you use information from a document, e.g. [3] for document [3].
    Never name the documents, only enter a number in square brackets as a reference.
    The reference may only refer to the number in square brackets after the passage.
    Otherwise, do not use brackets in your answer and give ONLY the number of the document without mentioning the word document.
    Give a precise, accurate and structured answer without repeating the question.
    Answer only on the basis of the documents provided. Do not make up facts.
    If the documents cannot answer the question or you are not sure, say so.
    These are the documents:
    {% for document in documents %}
    Document[{{ loop.index }}]:
    {{ document.content }}
    {% endfor %}
    Question: {{question}}
    Answer:
    [/INST]

    answer_builder:
    type: deepset_cloud_custom_nodes.augmenters.deepset_answer_builder.DeepsetAnswerBuilder
    init_parameters:
    reference_pattern: acm
    ...
    # here you would also configure other components, like the Generator
  3. Connect the components:
    1. Connect PromptBuilder's prompt to the Generator.
    2. Connect the Generator's replies to DeepsetAnswerBuilder.
      Here's what it should look like:
      connections:
      ...
      - sender: prompt_builder.prompt
      receiver: generator.prompt
      - sender: generator.replies
      receiver: answer_builder.replies
  4. Indicate DeepsetAnswerBuilder's output as the pipeline output:
    outputs:
    answers: answer_builder.answers

Here's an example of a query pipeline that uses this approach:

components:
bm25_retriever: # Selects the most similar documents from the document store
type: haystack_integrations.components.retrievers.opensearch.bm25_retriever.OpenSearchBM25Retriever
init_parameters:
document_store:
init_parameters:
use_ssl: True
verify_certs: False
hosts:
- ${OPENSEARCH_HOST}
http_auth:
- "${OPENSEARCH_USER}"
- "${OPENSEARCH_PASSWORD}"
type: haystack_integrations.document_stores.opensearch.document_store.OpenSearchDocumentStore
top_k: 20 # The number of results to return

query_embedder:
type: haystack.components.embedders.sentence_transformers_text_embedder.SentenceTransformersTextEmbedder
init_parameters:
model: "intfloat/e5-base-v2"
device: null

embedding_retriever: # Selects the most similar documents from the document store
type: haystack_integrations.components.retrievers.opensearch.embedding_retriever.OpenSearchEmbeddingRetriever
init_parameters:
document_store:
init_parameters:
use_ssl: True
verify_certs: False
http_auth:
- "${OPENSEARCH_USER}"
- "${OPENSEARCH_PASSWORD}"
type: haystack_integrations.document_stores.opensearch.document_store.OpenSearchDocumentStore
top_k: 20 # The number of results to return

document_joiner:
type: haystack.components.joiners.document_joiner.DocumentJoiner
init_parameters:
join_mode: concatenate

ranker:
type: haystack.components.rankers.transformers_similarity.TransformersSimilarityRanker
init_parameters:
model: "intfloat/simlm-msmarco-reranker"
top_k: 8
device: null
model_kwargs:
torch_dtype: "torch.float16"

prompt_builder:
type: haystack.components.builders.prompt_builder.PromptBuilder
init_parameters:
template: |-
<s>[INST] You are a technical expert.
You answer the questions truthfully on the basis of the documents provided.
For each document, check whether it is related to the question.
To answer the question, only use documents that are related to the question.
Ignore documents that do not relate to the question.
If the answer is contained in several documents, summarize them.
Always use references in the form [NUMBER OF DOCUMENT] if you use information from a document, e.g. [3] for document [3].
Never name the documents, only enter a number in square brackets as a reference.
The reference may only refer to the number in square brackets after the passage.
Otherwise, do not use brackets in your answer and give ONLY the number of the document without mentioning the word document.
Give a precise, accurate and structured answer without repeating the question.
Answer only on the basis of the documents provided. Do not make up facts.
If the documents cannot answer the question or you are not sure, say so.
These are the documents:
{% for document in documents %}
Document[{{ loop.index }}]:
{{ document.content }}
{% endfor %}
Question: {{question}}
Answer:
[/INST]
llm:
type: deepset_cloud_custom_nodes.generators.deepset_amazon_bedrock_generator.DeepsetAmazonBedrockGenerator
init_parameters:
model: "mistral.mistral-large-2402-v1:0"
aws_region_name: us-east-1
max_length: 400 # The maximum number of tokens the generated answer can have
model_max_length: 32000 # The maximum number of tokens the prompt and the generated answer can use
temperature: 0 # Lower temperature works best for fact-based qa

answer_builder:
type: deepset_cloud_custom_nodes.augmenters.deepset_answer_builder.DeepsetAnswerBuilder
init_parameters:
reference_pattern: acm

connections: # Defines how the components are connected
- sender: bm25_retriever.documents
receiver: document_joiner.documents
- sender: query_embedder.embedding
receiver: embedding_retriever.query_embedding
- sender: embedding_retriever.documents
receiver: document_joiner.documents
- sender: document_joiner.documents
receiver: ranker.documents
- sender: ranker.documents
receiver: prompt_builder.documents
- sender: ranker.documents
receiver: answer_builder.documents
- sender: prompt_builder.prompt
receiver: llm.prompt
- sender: llm.replies
receiver: answer_builder.replies

max_loops_allowed: 100

inputs: # Define the inputs for your pipeline
query: # These components will receive the query as input
- "bm25_retriever.query"
- "query_embedder.text"
- "ranker.query"
- "prompt_builder.question"
- "answer_builder.query"

filters: # These components will receive a potential query filter as input
- "bm25_retriever.filters"
- "embedding_retriever.filters"

outputs: # Defines the output of your pipeline
documents: "ranker.documents" # The output of the pipeline is the retrieved documents
answers: "answer_builder.answers"