Tutorial: Building a Robust Generative Question Answering System
Build a generative QA system running on your own data that can generate answers in a friendly and conversational tone and detect hallucinations. Learn how to test different prompts and save them for future.
- Level: Intermediate
- Time to complete: 15 minutes
- Prerequisites:
- You must be an Admin to complete this tutorial.
- You must have an API key from an active OpenAI account as this pipeline is using the gpt-3.5-turbo model by OpenAI.
- Goal: After completing this tutorial, you will have built a generative system that can answer questions about treating various diseases based on the documents from Mayo Clinic. This system will run on the data you provide to it and will be able to detect hallucinations.
- Keywords: PromptNode, large language models, hallucination detection, retrieval augmentation, gpt-3.5-turbo, Prompt Explorer
Create a Workspace
We need a deepset Cloud workspace to store our files and the generative pipeline.
- Log in to deepset Cloud.
- Click your name in the upper right corner and choose Workspaces.
- Click Add workspace and type
generative_qa
as the name.
Result: You have created a workspace called generative_qa
, where you'll upload the Mayo Clinic files.
Upload Files to Your Workspace
- First, download the mayoclinic.zip file and unpack it on your computer. (You can also use your own files.)
- Log in to deepset Cloud, switch to the generative_qa workspace, and go to Data>Files.

- Click Upload Files.
- Drop the files you unpacked in step 1 into the Upload Files window and click Upload.
- Wait until the upload finishes. You should have 1096 files in your workspace. You can check that on the Dashboard.
Result: Your files are in the generative_qa
workspace and you can see them on the Files page.

Connect Your OpenAI Account
You'll be able to use OpenAI models without having to pass the API keys in the pipeline YAML.
- Click your name in the top right corner and choose Connections.

- Next to OpenAI, click Connect, paste your OpenAI API key, and click Submit.
Result: You're connected to your OpenAI account and can use OpenAI models in your pipelines.

Create a Draft Pipeline
Let's create a pipeline that will be a starting point for the generative question answering app:
- In the navigation, click Pipelines>New Pipeline.
- On the New Pipeline page, in YAML Editor, click Create Pipeline>From Template.

-
Choose the Generative Question Answering GPT-3 template.
-
In line 7, change the pipeline name to "Generative_QA".
-
Let's add HallucinationDetector to our pipeline:
- Add a new line under
Components
in line 12 and type
components: - name: HallucinationDetector type: TransformersHallucinationDetector
- Scroll down to the pipelines section in line 89 and add HallucinationDetector in line 102 like this:
- name: HallucinationDetector inputs: [PromptNode]
Here's what your pipeline should look like now:
version: '1.21.0' name: 'Generative_QA' # This section defines nodes that you want to use in your pipelines. Each node must have a name and a type. You can also set the node's parameters here. # The name is up to you, you can give your component a friendly name. You then use components' names when specifying their order in the pipeline. # Type is the class name of the component. components: - name: HallucinationDetector type: TransformersHallucinationDetector - name: DocumentStore type: DeepsetCloudDocumentStore - name: BM25Retriever # The keyword-based retriever type: BM25Retriever params: document_store: DocumentStore top_k: 10 # The number of results to return - name: EmbeddingRetriever # Selects the most relevant documents from the document store type: EmbeddingRetriever # Uses a Transformer model to encode the document and the query params: document_store: DocumentStore embedding_model: sentence-transformers/multi-qa-mpnet-base-dot-v1 # Model optimized for semantic search. It has been trained on 215M (question, answer) pairs from diverse sources. model_format: sentence_transformers top_k: 10 # The number of results to return - name: JoinResults # Joins the results from both retrievers type: JoinDocuments params: join_mode: concatenate # Combines documents from multiple retrievers - name: Reranker # Uses a cross-encoder model to rerank the documents returned by the two retrievers type: SentenceTransformersRanker params: model_name_or_path: cross-encoder/ms-marco-MiniLM-L-6-v2 # Fast model optimized for reranking top_k: 4 # The number of results to return batch_size: 20 # Try to keep this number equal or larger to the sum of the top_k of the two retrievers so all docs are processed at once - name: qa_template type: PromptTemplate params: output_parser: type: AnswerParser prompt: "You are a technical expert. \ You answer questions truthfully based on provided documents. \ For each document check whether it is related to the question. \ Only use documents that are related to the question to answer it. \ Ignore documents that are not related to the question. \ If the answer exists in several documents, summarize them. \ Only answer based on the documents provided. Don't make things up. \ Always use references in the form [NUMBER OF DOCUMENT] when using information from a document. e.g. [3], for Document[3]. \ The reference must only refer to the number that comes in square brackets after passage. \ Otherwise, do not use brackets in your answer and reference ONLY the number of the passage without mentioning the word passage. \ If the documents can't answer the question or you are unsure say: 'The answer can't be found in the text'. \ {new_line}\ These are the documents:\ {join(documents, delimiter=new_line, pattern=new_line+'Document[$idx]:'+new_line+'$content')}\ {new_line}\ Question: {query}\ {new_line}\ Answer:\ {new_line}" - name: PromptNode type: PromptNode params: default_prompt_template: qa_template max_length: 400 # The maximum number of tokens the generated answer can have model_kwargs: # Specifies additional model settings temperature: 0 # Lower temperature works best for fact-based qa model_name_or_path: gpt-3.5-turbo - name: FileTypeClassifier # Routes files based on their extension to appropriate converters, by default txt, pdf, md, docx, html type: FileTypeClassifier - name: TextConverter # Converts files into documents type: TextConverter - name: PDFConverter # Converts PDFs into documents type: PDFToTextConverter - name: Preprocessor # Splits documents into smaller ones and cleans them up type: PreProcessor params: # With a vector-based retriever, it's good to split your documents into smaller ones split_by: word # The unit by which you want to split the documents split_length: 250 # The max number of words in a document split_overlap: 20 # Enables the sliding window approach language: en split_respect_sentence_boundary: True # Retains complete sentences in split documents # Here you define how the nodes are organized in the pipelines # For each node, specify its input pipelines: - name: query nodes: - name: BM25Retriever inputs: [Query] - name: EmbeddingRetriever inputs: [Query] - name: JoinResults inputs: [BM25Retriever, EmbeddingRetriever] - name: Reranker inputs: [JoinResults] - name: PromptNode inputs: [Reranker] - name: HallucinationDetector inputs: [PromptNode] - name: indexing nodes: # Depending on the file type, we use a Text or PDF converter - name: FileTypeClassifier inputs: [File] - name: TextConverter inputs: [FileTypeClassifier.output_1] # Ensures that this converter receives txt files - name: PDFConverter inputs: [FileTypeClassifier.output_2] # Ensures that this converter receives PDFs - name: Preprocessor inputs: [TextConverter, PDFConverter] - name: EmbeddingRetriever inputs: [Preprocessor] - name: DocumentStore inputs: [EmbeddingRetriever]
- Add a new line under
-
Save your pipeline.
-
When the pipeline is saved, click Deploy.

- Wait until the pipeline deploys. When it finishes, it should have the status Indexed. You can check it on the Pipelines page.
Result: You now have an indexed generative question answering pipeline that can detect hallucinations.

Test Your Prompt
Changes on their way
We're still actively working on this feature to make it better. This tutorial describes its current, first implementation. We'll be updating it soon to make it smoother.
The default prompt makes the model act as a matter-of-fact technical expert, while we want our system to be friendly and empathetic. Let's experiment with different prompts to achieve this effect.
- In the navigation, click Prompt Explorer.
- Choose the Generative_QA pipeline. Your current prompt is showing in the Prompt Editor pane.
- In the
Type your query here
placeholder, try asking some questions related to treating medical conditions, for example: "I had my wisdom tooth removed but my gum hurts and is swollen. What should I do?"

The model generates an answer, and you should also be able to check the documents it's based on.
- Now, let's try a different prompt. In Prompt Editor, click Templates and choose deepset. You can see all prompts curated by deepset.

- Choose deepset/question-answering and click Use Prompt.
The prompt is showing in the Prompt Editor. - Submit the same query. You can now compare the two answers to check which prompt performs better.
- Reload the whole page in your browser to return to the original prompt. Let's change it to adjust the tone:
You are a friendly nurse.\ You answer questions truthfully based on provided documents. \ Your answers are friendly, clear, and conversational. \ For each document check whether it is related to the question. \ Only use documents that are related to the question to answer it. \ Ignore documents that are not related to the question. \ If the answer exists in several documents, summarize them. \ Only answer based on the documents provided. Don't make things up. \ Always use references in the form [NUMBER OF DOCUMENT] when using information from a document. e.g. [3], for Document[3]. \ The reference must only refer to the number that comes in square brackets after passage. \ Otherwise, do not use brackets in your answer and reference ONLY the number of the passage without mentioning the word passage. \ If the documents can't answer the question or you are unsure say: 'I'm sorry I don't know that'. \ {new_line}\ These are the documents:\ {join(documents, delimiter=new_line, pattern=new_line+'Document[$idx]:'+new_line+'$content')}\ {new_line}\ Question: {query}\ {new_line}\ Answer:{new_line}
- Try the same query or experiment with other queries related to treating medical conditions. The answers should now be in a more empathetic and friendly tone. Here are some example questions you can ask:
"I have been diagnosed with a wheat allergy, what do I do now?"
"How do you treat swollen wrists?"
"What is meningitis?" - Copy the modified prompt from Prompt Editor to a notepad and click Templates.
- Paste the copied prompt in the text field, type
friendly_tone
as the prompt name, and save your prompt. You'll be able to reuse it in the future.

Result: You have tweaked your prompt to generate more friendly and conversational answers. You added this prompt to your custom prompts and can reuse it in other pipelines.

Update the Prompt in Your Pipeline
Prompt Explorer is a sandbox where you can test prompts. It doesn't change the actual prompts your pipeline's using. Let's now update the Generative_QA pipeline with the prompt you just tested in Prompt Explorer. Make sure you have the copied prompt from step 9 above at hand.
- In the navigation, go to Pipelines.
- Click the three dots next to the Generative_QA pipeline and choose Undeploy.
- When the pipeline is undeployed, click the three dots again and choose Edit.
- Scroll down to the
prompt
parameter in line 44 and replace the current prompt with the one you just tested in Prompt Explorer and copied to a notepad.

- Save and deploy your pipeline.
Result: You now have a generative question answering pipeline that generates answers in a friendly and conversational tone.
Test the Pipeline
Time to see your pipeline in action!
- In the navigation, click Search and make sure the Generative_QA pipeline is selected.
- Try asking something, like "my eyes hurt, what should I do?".
- Once the answer is generated, use the Show hallucinations toggle switch to see if the answers are actually in the documents.

The sentences that aren't based on your documents are underlined in red, and the ones that are are underlined in green.
Congratulations! You have built a generative question answering system that can answer questions related to treating various diseases in a friendly and conversational tone. Your system also shows you which parts of the answer are hallucinations.
Updated about 23 hours ago