Enable References for Generated Answers

Enhance your AI-generated answers with source references to make your app more trustworthy and verifiable.

About This Task

When using an LLM to generate answers, making it cite its resources can significantly enhance the credibility and traceability of the information provided. There are two ways to accomplish this:

  1. Using the ReferencePredictor component.
  2. Instructing the LLM to generate references.

The ReferencePredictor component excels at avoiding incorrect or hallucinated references, making it a reliable choice for maintaining accuracy. On the other hand, LLM-generated references perform better when sentences draw information from multiple sources, offering more flexibility in complex scenarios.

You can experiment with both approaches to find the one that's best for your use case.

📘

Pipeline Templates

Reference functionality is already included in RAG pipeline templates in deepset Cloud. Depending on the model, it's added either through ReferencePredictor or LLM-generated references.

Adding References with ReferencePredictor

ReferencePredictor shows references to documents on which the LLM's answer is based. Pipelines that contain ReferencePredictor return answers with references next to them. You can easily view the reference to check if the answer is based on it and ensure the model didn't hallucinate.

This is what the references look like in the interface:

An answer on the Search page with references next to each sentence.

To learn more about this component, see ReferencePredictor.

ReferencePredictor is used in query pipelines after AnswerBuilder, from which it receives answers. The answers must include documents to construct references, so you must make sure AnswerBuilder receives documents from a preceding component, like a Ranker or a Retriever. AnswerBuilder then adds these documents to the answers it sends to ReferencePredictor.

To add ReferencePredictor to your pipeline:

  1. Add ReferencePredictor to the components section of the query pipeline:
    components:
    ...
      reference_predictor:  # Finds and displays references to the generated answers from the retrieved documents
        type: deepset_cloud_custom_nodes.augmenters.reference_predictor.ReferencePredictor
        init_parameters:
          model: "cross-encoder/ms-marco-MiniLM-L-6-v2" # The default model used to compare similarity of sentences between the answer and source documents
          verifiability_model: "tstadel/answer-classification-setfit-v2-binary" # A default model for verifying if the generated answers need verification, it rejects noise and out-of-context answers. It works for English only. Set to `null` for other lanugages.
          use_split_rules: True # Keep this parameter setting for better splitting
          extend_abbreviations: True # Extends abbreviations handled with a curated list
          language: en # The language of the data for which you want to generate the references, defaults to en
          model_kwargs: # Specifies additional model settings
            torch_dtype: torch.float16
    
  • The default model used to compare the similarity of sentences between the answer and source documents is cross-encoder/ms-marco-MiniLM-L-6-v2. You can change it to another model by providing a path to a local model or the model identifier from Hugging Face.
  • The verifiability model checks if the generated answers need verification. By default, it uses a model we trained to ignore the noise, out-of-context answers, or no answer found. Our model works on English data only, so if your data is in other languages, set verifiability_mode_name_or_path: nulll.
  • Setting use_split_rules to True ensures answers are split correctly.
  • Setting extend_abbreviations to True extends the number of abbreviations Punkt tokenizer detects to ensure better sentence splitting.
  • By default, the language of the data is set to English.
    For detailed explanations of parameters, see ReferencePredictor.
  1. In the connections section, configure the connections:
    1. Make sure AnswerBuilder receives documents and answers from the preceding components.
    2. Connect AnswerBuilder answers to ReferencePredictor.
      This is what it should look like
      ...
      connections:
      ...
      - sender: ranker.documents # This can be any component that can send documents to Answerbuilder
        receiver: answer_builder.documents
      - sender: llm.replies
        receiver: answer_builder.replies # Answer builder takes the generated replies and turns them into answers
      - sender: answer_builder.answers
        receiver: reference_predictor.answers
      
  2. Make sure you specify ReferencePredictor's output as the final output of the pipeline:
    outputs:
      answers: reference_predictor.answers
    

Here's an example of a query pipeline that uses ReferencePredictor:

components:
  bm25_retriever: # Selects the most similar documents from the document store
    type: haystack_integrations.components.retrievers.opensearch.bm25_retriever.OpenSearchBM25Retriever
    init_parameters:
      document_store:
        init_parameters:
          use_ssl: True
          verify_certs: False
          hosts:
            - ${OPENSEARCH_HOST}
          http_auth:
            - "${OPENSEARCH_USER}"
            - "${OPENSEARCH_PASSWORD}"
        type: haystack_integrations.document_stores.opensearch.document_store.OpenSearchDocumentStore
      top_k: 20 # The number of results to return

  query_embedder:
    type: haystack.components.embedders.sentence_transformers_text_embedder.SentenceTransformersTextEmbedder
    init_parameters:
      model: "intfloat/e5-base-v2"
      device: null

  embedding_retriever: # Selects the most similar documents from the document store
    type: haystack_integrations.components.retrievers.opensearch.embedding_retriever.OpenSearchEmbeddingRetriever
    init_parameters:
      document_store:
        init_parameters:
          use_ssl: True
          verify_certs: False
          http_auth:
            - "${OPENSEARCH_USER}"
            - "${OPENSEARCH_PASSWORD}"
        type: haystack_integrations.document_stores.opensearch.document_store.OpenSearchDocumentStore
      top_k: 20 # The number of results to return

  document_joiner:
    type: haystack.components.joiners.document_joiner.DocumentJoiner
    init_parameters:
      join_mode: concatenate

  ranker:
    type: haystack.components.rankers.transformers_similarity.TransformersSimilarityRanker
    init_parameters:
      model: "intfloat/simlm-msmarco-reranker"
      top_k: 8
      device: null
      model_kwargs:
        torch_dtype: "torch.float16"

  prompt_builder:
    type: haystack.components.builders.prompt_builder.PromptBuilder
    init_parameters:
      template: |-
        You are a technical expert.
        You answer questions truthfully based on provided documents.
        If the answer exists in several documents, summarize them.
        Ignore documents that don't contain the answer to the question.
        Only answer based on the documents provided. Don't make things up.
        If no information related to the question can be found in the document, say so.
        These are the documents:
        {% for document in documents %}
        Document[{{ loop.index }}]:
        {{ document.content }}
        {% endfor %}
        Question: {{question}}
        Answer:

  llm:
    type: haystack.components.generators.openai.OpenAIGenerator
    init_parameters:
      api_key: {"type": "env_var", "env_vars": ["OPENAI_API_KEY"], "strict": False}
      model: "gpt-3.5-turbo"
      generation_kwargs:
        max_tokens: 400
        temperature: 0.0
        seed: 0

  answer_builder:
    init_parameters: {}
    type: haystack.components.builders.answer_builder.AnswerBuilder

  reference_predictor:  # Finds and displays references to the generated answers from the retrieved documents
    type: deepset_cloud_custom_nodes.augmenters.reference_predictor.ReferencePredictor
    init_parameters:
      model: "cross-encoder/ms-marco-MiniLM-L-6-v2"
      verifiability_model: "tstadel/answer-classification-setfit-v2-binary"
      use_split_rules: True # Uses additional rules for better splitting
      extend_abbreviations: True # Extends abbreviations handled with a curated list
      language: en
      model_kwargs: # Specifies additional model settings
        torch_dtype: torch.float16

connections:  # Defines how the components are connected
- sender: bm25_retriever.documents
  receiver: document_joiner.documents
- sender: query_embedder.embedding
  receiver: embedding_retriever.query_embedding
- sender: embedding_retriever.documents
  receiver: document_joiner.documents
- sender: document_joiner.documents
  receiver: ranker.documents
- sender: ranker.documents
  receiver: prompt_builder.documents
- sender: ranker.documents
  receiver: answer_builder.documents
- sender: prompt_builder.prompt
  receiver: llm.prompt
- sender: llm.replies
  receiver: answer_builder.replies
- sender: answer_builder.answers
  receiver: reference_predictor.answers

max_loops_allowed: 100

inputs:  # Define the inputs for your pipeline
  query:  # These components will receive the query as input
  - "bm25_retriever.query"
  - "query_embedder.text"
  - "ranker.query"
  - "prompt_builder.question"
  - "answer_builder.query"

  filters:  # These components will receive a potential query filter as input
  - "bm25_retriever.filters"
  - "embedding_retriever.filters"

outputs:  # Defines the output of your pipeline
  documents: "ranker.documents"  # The output of the pipeline is the retrieved documents
  answers: "reference_predictor.answers"  # The output of the pipeline is the generated answers

Adding References with an LLM

To add references using an LLM, you simply need to include specific instructions in your prompt. This method allows the language model to generate references based on its training and the query context.

Here is a prompt we've tested and recommend for you to use:

 You are a technical expert.
 You answer the questions truthfully on the basis of the documents provided.
 For each document, check whether it is related to the question.
 To answer the question, only use documents that are related to the question.
 Ignore documents that do not relate to the question.
 If the answer is contained in several documents, summarize them.
 Always use references in the form [NUMBER OF DOCUMENT] if you use information from a document, e.g. [3] for document [3].
 Never name the documents, only enter a number in square brackets as a reference.
 The reference may only refer to the number in square brackets after the passage.
 Otherwise, do not use brackets in your answer and give ONLY the number of the document without mentioning the word document.
 Give a precise, accurate and structured answer without repeating the question.
 Answer only on the basis of the documents provided. Do not make up facts.
 If the documents cannot answer the question or you are not sure, say so.
 These are the documents:
 {% for document in documents %}
        Document[{{ loop.index }}]:
        {{ document.content }}
        {% endfor %}
        Question: {{question}}
        Answer:

The Generator using the prompt must be connected with DeepsetAnswerBuilder, whose reference_pattern is set to acm for the references to be displayed properly.

Here are the detailed steps to add references with an LLM:

  1. Add the prompt with instructions to generate references to PromptBuilder:
    components:
    ...
      prompt_builder:
        type: haystack.components.builders.prompt_builder.PromptBuilder
        init_parameters:
          template: |-
            <s>[INST] You are a technical expert.
            You answer the questions truthfully on the basis of the documents provided.
            For each document, check whether it is related to the question.
            To answer the question, only use documents that are related to the question.
            Ignore documents that do not relate to the question.
            If the answer is contained in several documents, summarize them.
            Always use references in the form [NUMBER OF DOCUMENT] if you use information from a document, e.g. [3] for document [3].
            Never name the documents, only enter a number in square brackets as a reference.
            The reference may only refer to the number in square brackets after the passage.
            Otherwise, do not use brackets in your answer and give ONLY the number of the document without mentioning the word document.
            Give a precise, accurate and structured answer without repeating the question.
            Answer only on the basis of the documents provided. Do not make up facts.
            If the documents cannot answer the question or you are not sure, say so.
            These are the documents:
            {% for document in documents %}
            Document[{{ loop.index }}]:
            {{ document.content }}
            {% endfor %}
            Question: {{question}}
            Answer:
            [/INST]
            ...
            
            # Here you would also configure other components
    
  2. Add DeepsetAnswerBuilder and configure its reference_pattern parameter:
    components:
    ...
      prompt_builder:
        type: haystack.components.builders.prompt_builder.PromptBuilder
        init_parameters:
          template: |-
            <s>[INST] You are a technical expert.
            You answer the questions truthfully on the basis of the documents provided.
            For each document, check whether it is related to the question.
            To answer the question, only use documents that are related to the question.
            Ignore documents that do not relate to the question.
            If the answer is contained in several documents, summarize them.
            Always use references in the form [NUMBER OF DOCUMENT] if you use information from a document, e.g. [3] for document [3].
            Never name the documents, only enter a number in square brackets as a reference.
            The reference may only refer to the number in square brackets after the passage.
            Otherwise, do not use brackets in your answer and give ONLY the number of the document without mentioning the word document.
            Give a precise, accurate and structured answer without repeating the question.
            Answer only on the basis of the documents provided. Do not make up facts.
            If the documents cannot answer the question or you are not sure, say so.
            These are the documents:
            {% for document in documents %}
            Document[{{ loop.index }}]:
            {{ document.content }}
            {% endfor %}
            Question: {{question}}
            Answer:
            [/INST]
        
        answer_builder:
          type: deepset_cloud_custom_nodes.augmenters.deepset_answer_builder.DeepsetAnswerBuilder
          init_parameters:
            reference_pattern: acm
            ...
            # here you would also configure other components, like the Generator
    
  3. Connect the components:
    1. Connect PromptBuilder's prompt to the Generator.
    2. Connect the Generator's replies to DeepsetAnswerBuilder.
      Here's what it should look like:
      connections:
      ...
      - sender: prompt_builder.prompt
        receiver: generator.prompt
      - sender: generator.replies
        receiver: answer_builder.replies
      
  4. Indicate DeepsetAnswerBuilder's output as the pipeline output:
    outputs:
     answers: answer_builder.answers
    

Here's an example of a query pipeline that uses this approach:

components:
  bm25_retriever: # Selects the most similar documents from the document store
    type: haystack_integrations.components.retrievers.opensearch.bm25_retriever.OpenSearchBM25Retriever
    init_parameters:
      document_store:
        init_parameters:
          use_ssl: True
          verify_certs: False
          hosts:
            - ${OPENSEARCH_HOST}
          http_auth:
            - "${OPENSEARCH_USER}"
            - "${OPENSEARCH_PASSWORD}"
        type: haystack_integrations.document_stores.opensearch.document_store.OpenSearchDocumentStore
      top_k: 20 # The number of results to return

  query_embedder:
    type: haystack.components.embedders.sentence_transformers_text_embedder.SentenceTransformersTextEmbedder
    init_parameters:
      model: "intfloat/e5-base-v2"
      device: null

  embedding_retriever: # Selects the most similar documents from the document store
    type: haystack_integrations.components.retrievers.opensearch.embedding_retriever.OpenSearchEmbeddingRetriever
    init_parameters:
      document_store:
        init_parameters:
          use_ssl: True
          verify_certs: False
          http_auth:
            - "${OPENSEARCH_USER}"
            - "${OPENSEARCH_PASSWORD}"
        type: haystack_integrations.document_stores.opensearch.document_store.OpenSearchDocumentStore
      top_k: 20 # The number of results to return

  document_joiner:
    type: haystack.components.joiners.document_joiner.DocumentJoiner
    init_parameters:
      join_mode: concatenate

  ranker:
    type: haystack.components.rankers.transformers_similarity.TransformersSimilarityRanker
    init_parameters:
      model: "intfloat/simlm-msmarco-reranker"
      top_k: 8
      device: null
      model_kwargs:
        torch_dtype: "torch.float16"

  prompt_builder:
    type: haystack.components.builders.prompt_builder.PromptBuilder
    init_parameters:
      template: |-
        <s>[INST] You are a technical expert.
        You answer the questions truthfully on the basis of the documents provided.
        For each document, check whether it is related to the question.
        To answer the question, only use documents that are related to the question.
        Ignore documents that do not relate to the question.
        If the answer is contained in several documents, summarize them.
        Always use references in the form [NUMBER OF DOCUMENT] if you use information from a document, e.g. [3] for document [3].
        Never name the documents, only enter a number in square brackets as a reference.
        The reference may only refer to the number in square brackets after the passage.
        Otherwise, do not use brackets in your answer and give ONLY the number of the document without mentioning the word document.
        Give a precise, accurate and structured answer without repeating the question.
        Answer only on the basis of the documents provided. Do not make up facts.
        If the documents cannot answer the question or you are not sure, say so.
        These are the documents:
        {% for document in documents %}
        Document[{{ loop.index }}]:
        {{ document.content }}
        {% endfor %}
        Question: {{question}}
        Answer:
        [/INST]
  llm:
    type: deepset_cloud_custom_nodes.generators.deepset_amazon_bedrock_generator.DeepsetAmazonBedrockGenerator
    init_parameters:
      model: "mistral.mistral-large-2402-v1:0"
      aws_region_name: us-east-1
      max_length: 400 # The maximum number of tokens the generated answer can have
      model_max_length: 32000 # The maximum number of tokens the prompt and the generated answer can use
      temperature: 0 # Lower temperature works best for fact-based qa

  answer_builder:
    type: deepset_cloud_custom_nodes.augmenters.deepset_answer_builder.DeepsetAnswerBuilder
    init_parameters:
      reference_pattern: acm

connections:  # Defines how the components are connected
- sender: bm25_retriever.documents
  receiver: document_joiner.documents
- sender: query_embedder.embedding
  receiver: embedding_retriever.query_embedding
- sender: embedding_retriever.documents
  receiver: document_joiner.documents
- sender: document_joiner.documents
  receiver: ranker.documents
- sender: ranker.documents
  receiver: prompt_builder.documents
- sender: ranker.documents
  receiver: answer_builder.documents
- sender: prompt_builder.prompt
  receiver: llm.prompt
- sender: llm.replies
  receiver: answer_builder.replies

max_loops_allowed: 100

inputs:  # Define the inputs for your pipeline
  query:  # These components will receive the query as input
  - "bm25_retriever.query"
  - "query_embedder.text"
  - "ranker.query"
  - "prompt_builder.question"
  - "answer_builder.query"

  filters:  # These components will receive a potential query filter as input
  - "bm25_retriever.filters"
  - "embedding_retriever.filters"

outputs:  # Defines the output of your pipeline
  documents: "ranker.documents"  # The output of the pipeline is the retrieved documents
  answers: "answer_builder.answers"