Use Case: Optimizing Knowledge Extraction with Batch QA

There are situations where you need to extract a set of information from your dataset. You may also just have a large number of queries to run through your data. Batch question answering efficiently handles both cases.

Description

Batch question answering systems let you ask a large number of queries at once instead of typing them out one by one. They’re especially useful if you want to handle multiple queries simultaneously but don’t need immediate responses for each query.

In deepset Cloud, you can run your query set against each individual file, meaning that you’ll get as many responses as there are files. This can be useful to extract the same information from each file.

The other option is to run your query set across all files. The result is one answer per query. This is useful if you have a lot of queries to ask and need answers from the whole set of files.

Applications

Batch question answering systems shine in scenarios like due diligence, where you consistently seek specific information across your data. They're especially useful in:

  • Investment research and real estate due diligence: When you're evaluating potential investments, these systems can sift through documents to find the necessary criteria, streamlining the analysis process.
  • Insurance: Batch question answering can automate and simplify pulling consistent claim details from various documents, like receipts and invoices, or reviewing policies for specific conditions.
  • Legal analysis: Perfect for filling in a checklist of required information.
  • Audit, compliance, and risk management: Ideal for conducting uniform queries across multiple files, aiding in the efficient compilation of audit reports.
  • Metadata creation: You can create key:value pairs by running a set of queries on each of your files. The query is the key and the answer is the value. You can then use the results as file metadata.

Data

  • The data on which you run the query batch can be any textual data. Some examples could be reports, property descriptions, and the like. To use filters, make sure your files contain metadata you can filter by.
  • The query set must be a CSV file containing the queries, filters, and configuration for presenting the results—labels for queries and how you want to group them. For details, see Prepare a Query Set.

Pipelines

You can use any pipeline, but the typical choice would be a question answering pipeline, either RAG or extractive. You can use one of the ready-made pipeline templates in deepset Cloud.

You can use batch processing of queries as a means of checking how your pipeline performs and whether it returns correct answers. The batch qa job you use to process the queries is easy to create and runs quite fast, so if you find your pipeline doesn’t live up to your expectations, you can easily update it and create a new job to test the new version.