Tutorial: Building a Robust Generative Question Answering System

Build a generative QA system running on your own data that can generate answers in a friendly and conversational tone and detect hallucinations. Learn how to test different prompts and save them for future.

  • Level: Intermediate
  • Time to complete: 15 minutes
  • Prerequisites:
    • You must be an Admin to complete this tutorial.
    • You must have an API key from an active OpenAI account as this pipeline is using the gpt-3.5-turbo model by OpenAI.
  • Goal: After completing this tutorial, you will have built a generative system that can answer questions about treating various diseases based on the documents from Mayo Clinic. This system will run on the data you provide to it and will be able to detect hallucinations.
  • Keywords: PromptNode, large language models, hallucination detection, retrieval augmentation, gpt-3.5-turbo, Prompt Explorer

Create a Workspace

We need a deepset Cloud workspace to store our files and the generative pipeline.

  1. Log in to deepset Cloud.
  2. Click your name in the upper right corner and choose Workspaces.
  3. Click Add workspace and type generative_qa as the name.

Result: You have created a workspace called generative_qa, where you'll upload the Mayo Clinic files.

Upload Files to Your Workspace

  1. First, download the mayoclinic.zip file and unpack it on your computer. (You can also use your own files.)
  2. Log in to deepset Cloud, switch to the generative_qa workspace, and go to Data>Files.
Navigation bar in deepset Cloud with the workspace switch marked as number one and the files option marked as number two.
  1. Click Upload Files.
  2. Drop the files you unpacked in step 1 into the Upload Files window and click Upload.
  3. Wait until the upload finishes. You should have 1096 files in your workspace. You can check that on the Dashboard.

Result: Your files are in the generative_qa workspace and you can see them on the Files page.

The Files page with the uploaded files showing in a list

Connect Your OpenAI Account

You'll be able to use OpenAI models without having to pass the API keys in the pipeline YAML.

  1. Click your name in the top right corner and choose Connections.
The personal menu expanded with the Connections option underlined.
  1. Next to OpenAI, click Connect, paste your OpenAI API key, and click Submit.

Result: You're connected to your OpenAI account and can use OpenAI models in your pipelines.

The integrations section with the OpenAI option showing as connected.

Create a Draft Pipeline

Let's create a pipeline that will be a starting point for the generative question answering app:

  1. In the navigation, click Pipelines>New Pipeline.
  2. On the New Pipeline page, in YAML Editor, click Create Pipeline>From Template.
The YAML Editor section with the create pipeline option expanded and the from template option underlined
  1. Choose the Generative Question Answering GPT-3 template.

  2. In line 7, change the pipeline name to "Generative_QA".

  3. Let's add HallucinationDetector to our pipeline:

    1. Add a new line under Components in line 12 and type
    components:
    	- name: HallucinationDetector
      	type: TransformersHallucinationDetector
    
    1. Scroll down to the pipelines section in line 89 and add HallucinationDetector in line 102 like this:
      - name: HallucinationDetector
        inputs: [PromptNode]
      

    Here's what your pipeline should look like now:

    version: '1.21.0'
    name: 'Generative_QA'
    
    # This section defines nodes that you want to use in your pipelines. Each node must have a name and a type. You can also set the node's parameters here.
    # The name is up to you, you can give your component a friendly name. You then use components' names when specifying their order in the pipeline.
    # Type is the class name of the component. 
    components:
      - name: HallucinationDetector
        type: TransformersHallucinationDetector
      - name: DocumentStore
        type: DeepsetCloudDocumentStore
      - name: BM25Retriever # The keyword-based retriever
        type: BM25Retriever
        params:
          document_store: DocumentStore
          top_k: 10 # The number of results to return
      - name: EmbeddingRetriever # Selects the most relevant documents from the document store
        type: EmbeddingRetriever # Uses a Transformer model to encode the document and the query
        params:
          document_store: DocumentStore
          embedding_model: sentence-transformers/multi-qa-mpnet-base-dot-v1 # Model optimized for semantic search. It has been trained on 215M (question, answer) pairs from diverse sources.
          model_format: sentence_transformers
          top_k: 10 # The number of results to return
      - name: JoinResults # Joins the results from both retrievers
        type: JoinDocuments
        params:
          join_mode: concatenate # Combines documents from multiple retrievers
      - name: Reranker # Uses a cross-encoder model to rerank the documents returned by the two retrievers
        type: SentenceTransformersRanker
        params:
          model_name_or_path: cross-encoder/ms-marco-MiniLM-L-6-v2 # Fast model optimized for reranking
          top_k: 4 # The number of results to return
          batch_size: 20  # Try to keep this number equal or larger to the sum of the top_k of the two retrievers so all docs are processed at once
      - name: qa_template
        type: PromptTemplate
        params:
          output_parser:
            type: AnswerParser
          prompt: "You are a technical expert. \
            You answer questions truthfully based on provided documents. \
            For each document check whether it is related to the question. \
            Only use documents that are related to the question to answer it. \
            Ignore documents that are not related to the question. \
            If the answer exists in several documents, summarize them. \
            Only answer based on the documents provided. Don't make things up. \
            Always use references in the form [NUMBER OF DOCUMENT] when using information from a document. e.g. [3], for Document[3]. \
            The reference must only refer to the number that comes in square brackets after passage. \
            Otherwise, do not use brackets in your answer and reference ONLY the number of the passage without mentioning the word passage. \
            If the documents can't answer the question or you are unsure say: 'The answer can't be found in the text'. \
            {new_line}\
            These are the documents:\
            {join(documents, delimiter=new_line, pattern=new_line+'Document[$idx]:'+new_line+'$content')}\
            {new_line}\
            Question: {query}\
            {new_line}\
            Answer:\
            {new_line}"
      - name: PromptNode
        type: PromptNode
        params:
          default_prompt_template: qa_template
          max_length: 400 # The maximum number of tokens the generated answer can have
          model_kwargs: # Specifies additional model settings
            temperature: 0 # Lower temperature works best for fact-based qa
          model_name_or_path: gpt-3.5-turbo
      - name: FileTypeClassifier # Routes files based on their extension to appropriate converters, by default txt, pdf, md, docx, html
        type: FileTypeClassifier
      - name: TextConverter # Converts files into documents
        type: TextConverter
      - name: PDFConverter # Converts PDFs into documents
        type: PDFToTextConverter
      - name: Preprocessor # Splits documents into smaller ones and cleans them up
        type: PreProcessor
        params:
          # With a vector-based retriever, it's good to split your documents into smaller ones
          split_by: word # The unit by which you want to split the documents
          split_length: 250 # The max number of words in a document
          split_overlap: 20 # Enables the sliding window approach
          language: en
          split_respect_sentence_boundary: True # Retains complete sentences in split documents
    
    # Here you define how the nodes are organized in the pipelines
    # For each node, specify its input
    pipelines:
      - name: query
        nodes:
          - name: BM25Retriever
            inputs: [Query]
          - name: EmbeddingRetriever
            inputs: [Query]
          - name: JoinResults
            inputs: [BM25Retriever, EmbeddingRetriever]
          - name: Reranker
            inputs: [JoinResults]
          - name: PromptNode
            inputs: [Reranker]
          - name: HallucinationDetector
            inputs: [PromptNode]
      - name: indexing
        nodes:
        # Depending on the file type, we use a Text or PDF converter
          - name: FileTypeClassifier
            inputs: [File]
          - name: TextConverter
            inputs: [FileTypeClassifier.output_1] # Ensures that this converter receives txt files
          - name: PDFConverter
            inputs: [FileTypeClassifier.output_2] # Ensures that this converter receives PDFs
          - name: Preprocessor
            inputs: [TextConverter, PDFConverter]
          - name: EmbeddingRetriever
            inputs: [Preprocessor]
          - name: DocumentStore
            inputs: [EmbeddingRetriever]
    
  4. Save your pipeline.

  5. When the pipeline is saved, click Deploy.

Pipeline designer page with the pipeline name highlighted and the deploy button indicated
  1. Wait until the pipeline deploys. When it finishes, it should have the status Indexed. You can check it on the Pipelines page.

Result: You now have an indexed generative question answering pipeline that can detect hallucinations.

The pipelines page with the generative QA pipeline showing as indexed

Test Your Prompt

📘

Changes on their way

We're still actively working on this feature to make it better. This tutorial describes its current, first implementation. We'll be updating it soon to make it smoother.

The default prompt makes the model act as a matter-of-fact technical expert, while we want our system to be friendly and empathetic. Let's experiment with different prompts to achieve this effect.

  1. In the navigation, click Prompt Explorer.
  2. Choose the Generative_QA pipeline. Your current prompt is showing in the Prompt Editor pane.
  3. In the Type your query here placeholder, try asking some questions related to treating medical conditions, for example: "I had my wisdom tooth removed but my gum hurts and is swollen. What should I do?"
The prompt explorer window with the Generative QA pipeline selected and marked with a red number 1. Below pipeline selection, there's a welcome page. At the bottom of the page, there's prompt editor with the prompt text displayed. And below prompt editor there's the question about wisdom tooth marked with step 2.

The model generates an answer, and you should also be able to check the documents it's based on.

  1. Now, let's try a different prompt. In Prompt Editor, click Templates and choose deepset. You can see all prompts curated by deepset.
Prompt Editor with the templates button highlighted
  1. Choose deepset/question-answering and click Use Prompt.
    The prompt is showing in the Prompt Editor.
  2. Submit the same query. You can now compare the two answers to check which prompt performs better.
  3. Reload the whole page in your browser to return to the original prompt. Let's change it to adjust the tone:
    You are a friendly nurse.\
    You answer questions truthfully based on provided documents. \
    Your answers are friendly, clear, and conversational. \
    For each document check whether it is related to the question. \
    Only use documents that are related to the question to answer it. \
    Ignore documents that are not related to the question. \
    If the answer exists in several documents, summarize them. \
    Only answer based on the documents provided. Don't make things up. \
    Always use references in the form [NUMBER OF DOCUMENT] when using information from a document. e.g. [3], for Document[3]. \
    The reference must only refer to the number that comes in square brackets after passage. \
    Otherwise, do not use brackets in your answer and reference ONLY the number of the passage without mentioning the word passage. \
    If the documents can't answer the question or you are unsure say: 'I'm sorry I don't know that'. \
    {new_line}\
    These are the documents:\
    {join(documents, delimiter=new_line, pattern=new_line+'Document[$idx]:'+new_line+'$content')}\
    {new_line}\
    Question: {query}\
    {new_line}\
    Answer:{new_line}
    
  4. Try the same query or experiment with other queries related to treating medical conditions. The answers should now be in a more empathetic and friendly tone. Here are some example questions you can ask:
    "I have been diagnosed with a wheat allergy, what do I do now?"
    "How do you treat swollen wrists?"
    "What is meningitis?"
  5. Copy the modified prompt from Prompt Editor to a notepad and click Templates.
  6. Paste the copied prompt in the text field, type friendly_tone as the prompt name, and save your prompt. You'll be able to reuse it in the future.
Prompt templates window with the custom prompt template filled in with the text copied from prompt editor.

Result: You have tweaked your prompt to generate more friendly and conversational answers. You added this prompt to your custom prompts and can reuse it in other pipelines.

Update the Prompt in Your Pipeline

Prompt Explorer is a sandbox where you can test prompts. It doesn't change the actual prompts your pipeline's using. Let's now update the Generative_QA pipeline with the prompt you just tested in Prompt Explorer. Make sure you have the copied prompt from step 9 above at hand.

  1. In the navigation, go to Pipelines.
  2. Click the three dots next to the Generative_QA pipeline and choose Undeploy.
  3. When the pipeline is undeployed, click the three dots again and choose Edit.
  4. Scroll down to the prompt parameter in line 44 and replace the current prompt with the one you just tested in Prompt Explorer and copied to a notepad.
The updated prompt in pipeline YAML
  1. Save and deploy your pipeline.

Result: You now have a generative question answering pipeline that generates answers in a friendly and conversational tone.

Test the Pipeline

Time to see your pipeline in action!

  1. In the navigation, click Search and make sure the Generative_QA pipeline is selected.
  2. Try asking something, like "my eyes hurt, what should I do?".
  3. Once the answer is generated, use the Show hallucinations toggle switch to see if the answers are actually in the documents.
The answer to the query "my eyes hurt, what should I do?" with each sentence underlined in either green or red.

The sentences that aren't based on your documents are underlined in red, and the ones that are are underlined in green.

Congratulations! You have built a generative question answering system that can answer questions related to treating various diseases in a friendly and conversational tone. Your system also shows you which parts of the answer are hallucinations.