Query Types

There are three query types:

Keyword queries
Questions
Statements

Keyword queries are just keywords. They don't have a sentence structure and the order of words doesn't matter, for example:

last year results
results 2022
USA president

Questions, on the other hand, are complete, grammatical sentences, such as:

What were the results last year?
What were the results in 2022?
Who is the president of the USA

(Pipelines in deepset Cloud don't need a question mark to process a query.)

Statements are declarative sentences, such as:

Last year results were good.
Results in 2022 were not satisfying.
The president of the USA is Joe Biden.

Optimizing the Pipeline To Handle All Query Types

You can configure you pipeline so that each query type is routed to a node that's best at handling it, at the same time saving on GPU resources. For example, you can route questions and statements to a dense Retriever, such as DensePassageRetriever, and keywords to a sparse Retriever, such as BM25Retriever. deepset Cloud offers a node called QueryClassifier that's designed to do just that.

Here's what an example pipeline with this setup would look like:

A diagram showing the query pipeline that starts with a query which is then routed to the query classifier. The query classifier then routes output 1 to embedding retriever and output 2 to bm25 retriever. Then the embedding retriever output is routed further to a farm reader.

And here's the pipeline code:


components:
#here's how you specify QueryClassifier:
  - name: QueryClassifier
    type: TransformersQueryClassifier
    params:
      model_name_or_path: shahrukhx01/bert-mini-finetune-question-detection
  - name: DocumentStore
    type: DeepsetCloudDocumentStore
  - name: DenseRetriever
    type: EmbeddingRetriever 
    params:
      document_store: DocumentStore
      embedding_model: sentence-transformers/multi-qa-mpnet-base-dot-v1 
      model_format: sentence_transformers
      top_k: 20 
  - name: SparseRetriever
    type: BM25Retriever
    params:
      document_store: DocumentStore
  - name: Reader
    type: FARMReader
    params:
      model: deepset/deberta-v3-base-squad2
      use_gpu: True

pipelines:
  - name: query
    nodes: 
      - name: QueryClassifier
        inputs: [Query]
      - name: DenseRetriever
        inputs: [QueryClassifier.output_1]
      - name: SparseRetriever
        inputs: [QueryClassifier.output_2]
      - name: Reader
        inputs: [DenseRetriever]
 ... #here you'd need to specify the indexing pipeline