QdrantHybridRetriever
Retrieves documents from a QdrantDocumentStore using both dense and sparse vectors, fusing results with Reciprocal Rank Fusion.
Key Features
- Hybrid retrieval combining dense and sparse vector search.
- Reciprocal Rank Fusion (RRF) for merging dense and sparse results.
- Optional document grouping by payload fields.
- Score threshold filtering to limit returned results.
- Optional return of document embeddings alongside content.
Configuration
- Drag the
QdrantHybridRetrievercomponent onto the canvas from the Component Library. - Click the component to open the configuration panel.
- On the General tab:
- Configure the
document_storewith your Qdrant connection details. Make sure the document store hasuse_sparse_embeddingsenabled.
- Configure the
- Go to the Advanced tab to configure
top_k, filters,return_embedding, andscale_score.
Connections
QdrantHybridRetriever accepts both a dense query_embedding and a sparse query_sparse_embedding as inputs. It outputs a list of retrieved documents.
Connect a text embedder to query_embedding and a sparse text embedder to query_sparse_embedding. Connect the documents output to a PromptBuilder, Ranker, or answer builder.
Usage Example
components:
QdrantHybridRetriever:
type: qdrant.src.haystack_integrations.components.retrievers.qdrant.retriever.QdrantHybridRetriever
init_parameters:
Parameters
Inputs
| Parameter | Type | Default | Description |
|---|---|---|---|
| query_embedding | List[float] | Dense embedding of the query. | |
| query_sparse_embedding | SparseEmbedding | Sparse embedding of the query. | |
| filters | Optional[Union[Dict[str, Any], models.Filter]] | None | Filters applied to the retrieved Documents. The way runtime filters are applied depends on the filter_policy chosen at retriever initialization. See init method docstring for more details. |
| top_k | Optional[int] | None | The maximum number of documents to return. If using group_by parameters, maximum number of groups to return. |
| return_embedding | Optional[bool] | None | Whether to return the embedding of the retrieved Documents. |
| score_threshold | Optional[float] | None | A minimal score threshold for the result. Score of the returned result might be higher or smaller than the threshold depending on the Distance function used. E.g. for cosine similarity only higher scores will be returned. |
| group_by | Optional[str] | None | Payload field to group by, must be a string or number field. If the field contains more than 1 value, all values will be used for grouping. One point can be in multiple groups. |
| group_size | Optional[int] | None | Maximum amount of points to return per group. Default is 3. |
Outputs
| Parameter | Type | Default | Description |
|---|---|---|---|
| documents | List[Document] | The retrieved documents. |
Init Parameters
These are the parameters you can configure in Pipeline Builder:
| Parameter | Type | Default | Description |
|---|---|---|---|
| document_store | QdrantDocumentStore | An instance of QdrantDocumentStore. | |
| filters | Optional[Union[Dict[str, Any], models.Filter]] | None | A dictionary with filters to narrow down the search space. |
| top_k | int | 10 | The maximum number of documents to retrieve. If using group_by parameters, maximum number of groups to return. |
| return_embedding | bool | False | Whether to return the embeddings of the retrieved Documents. |
| filter_policy | Union[str, FilterPolicy] | FilterPolicy.REPLACE | Policy to determine how filters are applied. |
| score_threshold | Optional[float] | None | A minimal score threshold for the result. Score of the returned result might be higher or smaller than the threshold depending on the Distance function used. E.g. for cosine similarity only higher scores will be returned. |
| group_by | Optional[str] | None | Payload field to group by, must be a string or number field. If the field contains more than 1 value, all values will be used for grouping. One point can be in multiple groups. |
| group_size | Optional[int] | None | Maximum amount of points to return per group. Default is 3. |
Run Method Parameters
These are the parameters you can configure for the component's run() method. This means you can pass these parameters at query time through the API, in Playground, or when running a job. For details, see Modify Pipeline Parameters at Query Time.
| Parameter | Type | Default | Description |
|---|---|---|---|
| query_embedding | List[float] | Dense embedding of the query. | |
| query_sparse_embedding | SparseEmbedding | Sparse embedding of the query. | |
| filters | Optional[Union[Dict[str, Any], models.Filter]] | None | Filters applied to the retrieved Documents. The way runtime filters are applied depends on the filter_policy chosen at retriever initialization. See init method docstring for more details. |
| top_k | Optional[int] | None | The maximum number of documents to return. If using group_by parameters, maximum number of groups to return. |
| return_embedding | Optional[bool] | None | Whether to return the embedding of the retrieved Documents. |
| score_threshold | Optional[float] | None | A minimal score threshold for the result. Score of the returned result might be higher or smaller than the threshold depending on the Distance function used. E.g. for cosine similarity only higher scores will be returned. |
| group_by | Optional[str] | None | Payload field to group by, must be a string or number field. If the field contains more than 1 value, all values will be used for grouping. One point can be in multiple groups. |
| group_size | Optional[int] | None | Maximum amount of points to return per group. Default is 3. |
Was this page helpful?