Tutorial: Creating an Experiment for a QA Pipeline
Use this tutorial to learn how to create an experiment to evaluate your question answering pipeline. The instructions here guide you through the steps to create your first experiment. They also contain all the data you need for it.
- Level: Beginner
- Time to complete: 15 minutes
- Prerequisites:
- This tutorial assumes a basic knowledge of NLP and the concept of pipeline evaluation. If you need more information, look at About Experiments.
- You must be an Admin to complete this tutorial.
- Goal: After completing this tutorial, you will have created and run an experiment to evaluate a question answering pipeline.
This tutorial contains all the necessary files and an evaluation dataset, but you can replace them with your own. It also guides you through creating a QA pipeline from a ready-made template but you can also use a previously created pipeline.
Upload Files
Your pipeline will run the search on these files.
-
Download the .zip file with sample files and unpack it on your computer.
-
Log in to deepset Cloud, make sure you're in the right workspace, and go to Data>Files.
-
Click Upload Files.
-
Select all the files you extracted and drop them into the Upload Files window. There should be 344 files in total.
-
Click Upload and wait until the files are uploaded.
Result: Your files are in your workspace, and you can see them on the Files page.
Upload an Evaluation Set
You need a set of annotated data your pipeline will be evaluated against.
- Download the CSV file and save it on your computer.
- In deepset Cloud, go to Data>Evaluation Sets and click Upload Evaluation Sets.
- Drop the CSV file you downloaded in step 1 in the Upload Evaluation Sets window and click Upload. Wait for the confirmation that it uploaded without problems.
Result: The evaluation dataset is uploaded, and you can see it on the Evaluation Sets page.
Create a Pipeline to Run the Experiment On
-
In deepset Cloud, go to Pipelines>New Pipeline.
-
In YAML Editor, click Create Pipeline and select From Template.
-
Choose the Extractive Question Answering template.
-
In the YAML editor, in line 7, find
name
and change it to'Test_experiment'
. -
Change the
top_k
parameter value in line 21 to10
.
This is what your pipeline should look like now:version: '1.21.0' name: 'QuestionAnswering_en' # This section defines nodes that you want to use in your pipelines. Each node must have a name and a type. You can also set the node's parameters here. # The name is up to you, you can give your component a friendly name. You then use components' names when specifying their order in the pipeline. # Type is the class name of the component. components: - name: DocumentStore type: DeepsetCloudDocumentStore # The only supported document store in deepset Cloud - name: Retriever # Selects the most relevant documents from the document store and passes them on to the Reader type: EmbeddingRetriever # Uses a Transformer model to encode the document and the query params: document_store: DocumentStore embedding_model: sentence-transformers/multi-qa-mpnet-base-dot-v1 # Model optimized for semantic search model_format: sentence_transformers top_k: 10 # The number of results to return - name: Reader # The component that actually fetches answers from among the 20 documents returned by retriever type: FARMReader # Transformer-based reader, specializes in extractive QA params: model_name_or_path: deepset/deberta-v3-large-squad2 # An optimized variant of BERT, a strong all-round model context_window_size: 700 # The size of the window around the answer span - name: FileTypeClassifier # Routes files based on their extension to appropriate converters, by default txt, pdf, md, docx, html type: FileTypeClassifier - name: TextConverter # Converts files into documents type: TextConverter - name: PDFConverter # Converts PDFs into documents type: PDFToTextConverter - name: Preprocessor # Splits documents into smaller ones and cleans them up type: PreProcessor params: # With a vector-based retriever, it's good to split your documents into smaller ones split_by: word # The unit by which you want to split the documents split_length: 250 # The max number of words in a document split_overlap: 30 # Enables the sliding window approach split_respect_sentence_boundary: True # Retains complete sentences in split documents language: en # Used by NLTK to best detect the sentence boundaries for that language # Here you define how the nodes are organized in the pipelines # For each node, specify its input pipelines: - name: query nodes: - name: Retriever inputs: [Query] - name: Reader inputs: [Retriever] - name: indexing nodes: # Depending on the file type, we use a Text or PDF converter - name: FileTypeClassifier inputs: [File] - name: TextConverter inputs: [FileTypeClassifier.output_1] # Ensures this converter receives TXT files - name: PDFConverter inputs: [FileTypeClassifier.output_2] # Ensures this converter receives PDFs - name: Preprocessor inputs: [TextConverter, PDFConverter] - name: Retriever inputs: [Preprocessor] - name: DocumentStore inputs: [Retriever]
-
Save your pipeline.
Result: You have created a question answering pipeline that you'll evaluate next. Your pipeline is displayed on the Pipelines page.

Create an Experiment
Now it's time to evaluate your pipeline.
-
Go to Experiments>New Experiment.
-
Choose Test_experiment as the pipeline.
-
Choose annotations_jazz as the evaluation set.
-
Type jazz as the experiment name and add test as a tag.
-
Click Start Experiment. You can see that jazz is running. Wait until it completes. It may take a couple of minutes.
-
When the experiment status changes to
Completed
, click its name to view its details, such as the data and the pipeline, the metrics, and predictions.
Result: Congratulations! You just created an experiment and ran it to evaluate your pipeline.
You can now review the results on the experiment details page.
Updated about 22 hours ago