Skip to main content
For the complete documentation index for agents and LLMs, see llms.txt.

QdrantHybridRetriever

Retrieve documents from a QdrantDocumentStore using both dense and sparse vectors, fusing the results using Reciprocal Rank Fusion.

Key Features

  • Hybrid retrieval combining dense and sparse vector search from Qdrant.
  • Uses Reciprocal Rank Fusion to combine results from both retrieval methods.
  • Configurable number of results with top_k and minimum score threshold.
  • Supports metadata filtering to narrow down the search space.
  • Supports document grouping by a payload field.

Configuration

  1. Drag the QdrantHybridRetriever component onto the canvas from the Component Library.
  2. Click on the component to open the configuration panel.
  3. On the General tab:
    • Configure the QdrantDocumentStore with your Qdrant instance details. Make sure your Qdrant index uses sparse embeddings (use_sparse_embeddings=True).
    • Set top_k to control the maximum number of documents to retrieve.
  4. Go to the Advanced tab to configure filter_policy, score_threshold, return_embedding, group_by, and group_size.

Connections

QdrantHybridRetriever receives both a dense query embedding and a sparse query embedding as inputs. It sends retrieved documents to downstream components such as PromptBuilder or a ranker.

Source Code

To check this component's source code, open retriever.py in the Haystack Core Integrations repository.

Usage Examples

Basic Configuration

  QdrantHybridRetriever:
type: qdrant.src.haystack_integrations.components.retrievers.qdrant.retriever.QdrantHybridRetriever
init_parameters: {}
components:
QdrantHybridRetriever:
type: qdrant.src.haystack_integrations.components.retrievers.qdrant.retriever.QdrantHybridRetriever
init_parameters:

Parameters

Inputs

ParameterTypeDescription
query_embeddingList[float]Dense embedding of the query.
query_sparse_embeddingSparseEmbeddingSparse embedding of the query.
filtersOptional[Union[Dict[str, Any], models.Filter]]Filters applied to the retrieved Documents. The way runtime filters are applied depends on the filter_policy chosen at retriever initialization.
top_kOptional[int]The maximum number of documents to return. If using group_by parameters, maximum number of groups to return.
return_embeddingOptional[bool]Whether to return the embedding of the retrieved Documents.
score_thresholdOptional[float]A minimal score threshold for the result. Score of the returned result might be higher or smaller than the threshold depending on the Distance function used.
group_byOptional[str]Payload field to group by, must be a string or number field. If the field contains more than one value, all values will be used for grouping. One point can be in multiple groups.
group_sizeOptional[int]Maximum amount of points to return per group. Default is 3.

Outputs

ParameterTypeDescription
documentsList[Document]The retrieved documents.

Init Parameters

These are the parameters you can configure in Pipeline Builder:

ParameterTypeDefaultDescription
document_storeQdrantDocumentStoreAn instance of QdrantDocumentStore.
filtersOptional[Union[Dict[str, Any], models.Filter]]NoneA dictionary with filters to narrow down the search space.
top_kint10The maximum number of documents to retrieve. If using group_by parameters, maximum number of groups to return.
return_embeddingboolFalseWhether to return the embeddings of the retrieved Documents.
filter_policyUnion[str, FilterPolicy]FilterPolicy.REPLACEPolicy to determine how filters are applied.
score_thresholdOptional[float]NoneA minimal score threshold for the result. Score of the returned result might be higher or smaller than the threshold depending on the Distance function used. For example, for cosine similarity only higher scores will be returned.
group_byOptional[str]NonePayload field to group by, must be a string or number field. If the field contains more than one value, all values will be used for grouping. One point can be in multiple groups.
group_sizeOptional[int]NoneMaximum amount of points to return per group. Default is 3.

Run Method Parameters

These are the parameters you can configure for the component's run() method. This means you can pass these parameters at query time through the API, in Playground, or when running a job. For details, see Modify Pipeline Parameters at Query Time.

ParameterTypeDefaultDescription
query_embeddingList[float]Dense embedding of the query.
query_sparse_embeddingSparseEmbeddingSparse embedding of the query.
filtersOptional[Union[Dict[str, Any], models.Filter]]NoneFilters applied to the retrieved Documents. The way runtime filters are applied depends on the filter_policy chosen at retriever initialization. See init method docstring for more details.
top_kOptional[int]NoneThe maximum number of documents to return. If using group_by parameters, maximum number of groups to return.
return_embeddingOptional[bool]NoneWhether to return the embedding of the retrieved Documents.
score_thresholdOptional[float]NoneA minimal score threshold for the result. Score of the returned result might be higher or smaller than the threshold depending on the Distance function used. For example, for cosine similarity only higher scores will be returned.
group_byOptional[str]NonePayload field to group by, must be a string or number field. If the field contains more than one value, all values will be used for grouping. One point can be in multiple groups.
group_sizeOptional[int]NoneMaximum amount of points to return per group. Default is 3.