A document retrieval system, also called document search, is a system that returns whole documents relevant to the query.
A document retrieval system is best if:
- Answers cannot be short spans of text but need to be more complex and long text passages.
- You need a fast system. A document retrieval system doesn't use a reader, which speeds it up significantly.
- You need a system that can handle millions of requests and has very low latency.
- You need a system that can handle natural language questions.
- Word-based approaches, such as Elasticsearch, are not enough for your use case.
When compared to a question answering system, document retrieval is faster and cheaper. It can even work on the CPU in production. Also, the document-retrieval models available are very powerful already so domain adaptation is easier than it is for question answering.
An example of document search
Here's what a document retrieval system looks like:
You can use any text data. For a fast prototype, your data should be restricted to one domain.
You can divide your data into underlying text data and an annotated set for evaluating your pipelines.
- Data scientists: Design the system, create the pipelines, and supervise domain experts.
- Domain experts: Use the system and provide their feedback in the deepset Cloud interface.
For examples of pipelines, see Document Retrieval Pipelines.
Updated 15 days ago