Handling Different Query Types

Queries come in different shapes—keywords, questions, and statements. You can optimize your search by routing each query type to a node that handles it best.

Query Types

There are three query types:

  • Keyword queries
  • Questions
  • Statements

Keyword queries are just keywords. They don't have a sentence structure and the order of words doesn't matter, for example:

  • last year results
  • results 2022
  • USA president

Questions, on the other hand, are complete, grammatical sentences, such as:

  • What were the results last year?
  • What were the results in 2022?
  • Who is the president of the USA

(Pipelines in deepset Cloud don't need a question mark to process a query.)

Statements are declarative sentences, such as:

  • Last year results were good.
  • Results in 2022 were not satisfying.
  • The president of the USA is Joe Biden.

Optimizing the Pipeline To Handle All Query Types

You can configure you pipeline so that each query type is routed to a node that's best at handling it, at the same time saving on GPU resources. For example, you can route questions and statements to a dense Retriever, such as DensePassageRetriever, and keywords to a sparse Retriever, such as BM25Retriever. deepset Cloud offers a node called QueryClassifier that's designed to do just that.

Here's what an example pipeline with this setup would look like:

A diagram showing the query pipeline that starts with a query which is then routed to the query classifier. The query classifier then routes output 1 to embedding retriever and output 2 to bm25 retriever. Then the embedding retriever output is routed further to a farm reader.

And here's the pipeline code:

#here's how you specify QueryClassifier:
  - name: QueryClassifier
    type: TransformersQueryClassifier
      model_name_or_path: shahrukhx01/bert-mini-finetune-question-detection
  - name: DocumentStore
    type: DeepsetCloudDocumentStore
  - name: DenseRetriever
    type: EmbeddingRetriever 
      document_store: DocumentStore
      embedding_model: sentence-transformers/multi-qa-mpnet-base-dot-v1 
      model_format: sentence_transformers
      top_k: 20 
  - name: SparseRetriever
    type: BM25Retriever
      document_store: DocumentStore
  - name: Reader
    type: FARMReader
      model: deepset/deberta-v3-base-squad2
      use_gpu: True

  - name: query
      - name: QueryClassifier
        inputs: [Query]
      - name: DenseRetriever
        inputs: [QueryClassifier.output_1]
      - name: SparseRetriever
        inputs: [QueryClassifier.output_2]
      - name: Reader
        inputs: [DenseRetriever]
 ... #here you'd need to specify the indexing pipeline

