There are three query types:
- Keyword queries
Keyword queries are just keywords. They don't have a sentence structure and the order of words doesn't matter, for example:
- last year results
- results 2022
- USA president
Questions, on the other hand, are complete, grammatical sentences, such as:
- What were the results last year?
- What were the results in 2022?
- Who is the president of the USA
(Pipelines in deepset Cloud don't need a question mark to process a query.)
Statements are declarative sentences, such as:
- Last year results were good.
- Results in 2022 were not satisfying.
- The president of the USA is Joe Biden.
You can configure you pipeline so that each query type is routed to a node that's best at handling it, at the same time saving on GPU resources. For example, you can route questions and statements to a dense Retriever, such as DensePassageRetriever, and keywords to a sparse Retriever, such as BM25Retriever. deepset Cloud offers a node called QueryClassifier that's designed to do just that.
Here's what an example pipeline with this setup would look like:
And here's the pipeline code:
name: 'QueryClassifierPipeline' version: '1.21.0' components: #here's how you specify QueryClassifier: - name: QueryClassifier type: TransformersQueryClassifier params: model_name_or_path: shahrukhx01/bert-mini-finetune-question-detection - name: DocumentStore type: DeepsetCloudDocumentStore - name: DenseRetriever type: EmbeddingRetriever params: document_store: DocumentStore embedding_model: sentence-transformers/multi-qa-mpnet-base-dot-v1 model_format: sentence_transformers top_k: 20 - name: SparseRetriever type: BM25Retriever params: document_store: DocumentStore - name: Reader type: FARMReader params: model: deepset/deberta-v3-base-squad2 use_gpu: True pipelines: - name: query nodes: - name: QueryClassifier inputs: [Query] - name: DenseRetriever inputs: [QueryClassifier.output_1] - name: SparseRetriever inputs: [QueryClassifier.output_2] - name: Reader inputs: [DenseRetriever] ... #here you'd need to specify the indexing pipeline
Updated about 22 hours ago