Evaluation Datasets
An evaluation dataset is a file with gold answers for your search system. Learn about the format of the dataset and how to prepare it.
What's an Evaluation Dataset?
An evaluation dataset is an annotated set of data held back from your model. The annotations, or labels, can be question-answer pairs (for a question answering system) or question-passage pairs (for an information retrieval system). They indicate the gold answers, which are the answers that you would expect your search system to return.
The evaluation dataset in deepset Cloud is based on the files you uploaded to Data>Files. After you add your evaluation set, deepset Cloud automatically matches the labels in your dataset with the files in your workspace using file names. If there are labels for which there is no match, deepset Cloud lets you know. The evaluation dataset only works for the files that existed in deepset Cloud at the time when you uploaded the evaluation set.
Why Do You Need an Evaluation Dataset?
During the evaluation, deepset Cloud uses the questions from your evaluation dataset and runs them through the evaluated pipeline letting the system find the answers in all your files. The more files you provide, the more complex the task for the system is.
deepset Cloud then compares the answers returned by your search system to the gold answers from your evaluation dataset and, based on the results, calculates the metrics you can use to tweak your pipeline and boost its performance.
Evaluation Datasets in deepset Cloud
In deepset Cloud, you can upload a pre-labeled evaluation dataset or label a dataset with other collaborators on the platform.
Dataset Format
The format differs for question answering and document retrieval. In both cases it must be a CSV file but the columns they contain differ.
Question Answering Dataset
The evaluation dataset must be a CSV file with the following columns:
question
: the question text.text
: the answer to the question.context
: the text surrounding the answer.file_name
: the name of the file that contains the answer. It's optional.answer_start
: the position of the character that starts the answer text in the context.answer_end
: the position of the character that ends the answer text in the context.filters
: any filters that should be used for search. This field is optional.
If the file contains any other columns, deepset Cloud ignores them.
Here's an evaluation dataset for Harry Potter. The additional columns answer_category
, question_id
, and document_id
are ignored in deepset Cloud. This example is meant to show you the format your dataset should follow.
Document Retrieval Dataset
The evaluation dataset must be a CSV file with the following columns:
question
: the query text.text
: leave this column empty.context
: the document to be retrieved as an answer.file_name
: the name of the file where the document comes from. This column is optional.filters
: any filters that should be used for search. This field is optional. If you don't want to use any filters, delete this column. For more information, see Filtering Logic.
Here's an example document retrieval dataset: ms-marco.
Updated 1 day ago