Guidelines for Labeling Data

This is a set of best practices and tips on how to annotate data to create evaluation datasets for document search systems using the Labeling feature.


To evaluate your pipeline, you need an evaluation set. This set contains gold answers, which are high-quality, accurate answers or passages of text that your pipeline aims to approximate. During experiments, your pipeline is tested against this evaluation set to check its alignment with the gold answers. For more information, see Evaluation Datasets.

For question answering systems, you need question-answer pairs, while for document search systems, you need question-passage pairs.

deepset Cloud offers Labeling Projects that you can use to create annotated datasets for document search pipelines. Once a project is created, you can invite labelers to it. To label the documents, they just use the search to ask questions the way the users would do it, and then they indicate if the documents displayed as results are relevant.

How to Write Questions?

  • Ensure that answers to your questions are within the data corpus you're annotating. Don't formulate questions that can only be answered with additional knowledge or your interpretation.
  • Write questions that users might typically ask. Use any available search history from tools they have used before as a source of inspiration.
  • The more questions you create, the better. Generally, for model evaluation, it's good to have between 200 and 500 questions. For training, the more, the better.
  • Ensure that your questions are understandable without the need to read the corresponding text passages.
  • If there's a typo or a grammar mistake in your question, it's OK. When your system is live, users will likely make spelling errors when typing their questions. But don't focus on purposefully making mistakes in your questions when annotating.
  • Ask questions that are within the technical capabilities of your pipeline. For example, don't ask a document search pipeline to summarize documents.

How To Mark Answers

  • Indicate the relevance of each document you see in results with a thumbs down (irrelevant), thumbs up (relevant), or flag (not sure) icon.
  • When evaluating the documents, assume the rule that every document that is relevant to your question is correct. Relevant means you would want to see this document in the search results of your app.
  • Label each document you see in the results.