Create an Evaluation Set

Learn how to create question-answer pairs for your model using the Annotation Tool.

Annotation Tool

deepset created a free Annotation Tool that you can use to prepare question-answer pairs. You can upload your text and CSV documents, create questions, and mark the answers. When you're done, you can easily export your annotated files to a CSV file, ready to use in deepset Cloud.

What's Annotated Data?

Annotated data, also called an evaluation set, is aset of data held back from your model. The annotations, or labels, can be question-answer pairs (for a question-answering system) or question-passage pairs (for an information retrieval system). They indicate the gold answers, which are the answers that you would expect your search system to return. For more information, see Evaluation Datasets.

Prerequisites

  • Get familiar with the guidelines for annotating data and comply with them when preparing your annotated data.
  • Use the Chrome browser. The tool has not been tested on other browsers.
  • Set up an account in the Annotation Tool.

Annotating Data in the Annotation Tool

Create a Project

  1. Log in to Annotation Tool and click Create project.
  2. Give your project a name and select the annotation mode:
    • Default: Leaves the Answer Category column in the file with exported question-answer pairs empty
    • Answer category: Fills in the Answer Category column in the exported file.

You can create a separate project for each user working on data annotation.

What's the Answer Category column?

After you created your question-answer pairs, you may want to export them. The Annotation tool exports them to a file with the following columns:

  • answer_id
  • document_id
  • question_id
  • text
  • answer_start
  • answer_end
  • answer_category
  • question
  • file_name
  • context

The answer_category field is the field that gets populated if you choose Answer category as the annotation mode. If you choose the Default mode it will be left empty. The field describes the answer type, such as Short, concise answer or Yes answer.

Upload Documents

There's no length limit on the documents you upload to the tool. They can be as long as you need them to be.

  1. In the Annotation tool, click the arrow icon under the Actions column for your project.
The Action column
A screen shot of the Annotation Tool with the arrow icon highlighted in the Action columnA screen shot of the Annotation Tool with the arrow icon highlighted in the Action column

The arrow icon for your project

  1. Go to Import>Import Documents.

  2. Drag the files that you want to annotate and drop them in the tool.

Create Question-Answer Pairs

Creating New Questions

Use this method to create questions from scratch.

  1. On the menu, select Documents.

  2. In the Action column, click the arrow icon next to the document for which you want to add questions. You're redirected to the Questions view.

  3. To add a question:

    1. Click ADD CUSTOM QUESTION.
    2. Select a text passage. A question window pops up.
    3. Enter the question text and select the answer category.
      • Short answers are numbers or few words
      • Long answers span multiple sentences and may include lists
    4. Submit your question.
    5. Continue until you add all the questions that you need.
    6. When you add all the questions for this document, move to the next document using the arrows at the top.

Importing Questions

Use this method if you have questions ready. Questions must be in a CSV file.

  1. On the menu, go to Import>Questions.
  2. Drop your CSV file with questions in the tool. If you go to Questions on the menu, you'll see all your imported questions.
  3. Match your questions with answers:
    1. On the menu, click Documents.
    2. Find the document you want to work with and click the right arrow in the Actions column.
    3. Click a question and highlight the answer in the document.
    4. Select the Answer Category on the window that displays and save it.
    5. To indicate that a question has no answer in the document, click the Edit icon next to the question and select Answer is not given.

Creating Unanswerable Questions

Having questions without answers teaches your model to predict "no answer". This prevents the model from fetching incorrect answers if the dataset doesn't contain the correct one.

  • If you imported your questions:
    1. On the menu, open Questions.
    2. Find the question with no answer, click the edit icon next to it, and select Answer not given.
  • If you created questions in the tool, you need to use a workaround:
    1. Add a token at the end of each document you upload. The token can be NOANSWER.
    2. For questions with no answer in the document, select the token as answer.
    3. Export your data to a SQuAD file and find all questions for which you selected the token as answer.
    4. For each such question, delete all data in the answers list to make it an empty list and set is_impossible to true.
      Have a look at this example. The token used here is NOANSWER.
       {
            "paragraphs": [
              {
                "qas": [
                  {
                    "question": "Who is Michael Jackson?",
                    "id": 452907,
                    "answers": [
                      {
                        "answer_id": 559434,
                        "document_id": 976448,
                        "question_id": 452907,
                        "text": "NOANSWER",
                        "answer_start": 36724,
                        "answer_end": 36726,
                        "answer_category": "OTHER"
                      }
                    ],
                    "is_impossible": false
                  }
      
       {
            "paragraphs": [
              {
                "qas": [
                  {
                    "question": "Who is Michael Jackson?",
                    "id": 452907,
                    "answers": [ ],
                    "is_impossible": true
                  }
      

Export Your Data

After you annotated all your documents, you can export your question-answer pairs. The exported CSV file fills all the deepset Cloud requirements so you can save and upload it to deepset Cloud.

  1. Click Export Labels.

  2. Click Export answers>Export table in CSV. The .csv file is downloaded to your computer.

  3. Import Data to deepset Cloud

You can now upload the .csv file with question-answer pairs to deepset Cloud. To find out how to do this, see Upload Files.