QdrantSparseEmbeddingRetriever
Retrieve documents from a QdrantDocumentStore using sparse vector embeddings.
Key Features
- Sparse vector-based retrieval from a Qdrant vector database.
- Configurable number of results with
top_kand minimum score threshold. - Supports metadata filtering to narrow down the search space.
- Supports document grouping by a payload field.
- Optional score scaling and sparse embedding return.
Configuration
- Drag the
QdrantSparseEmbeddingRetrievercomponent onto the canvas from the Component Library. - Click on the component to open the configuration panel.
- On the General tab:
- Configure the
QdrantDocumentStorewith your Qdrant instance details. Make sure your Qdrant index uses sparse embeddings (use_sparse_embeddings=True). - Set
top_kto control the maximum number of documents to retrieve.
- Configure the
- Go to the Advanced tab to configure
filter_policy,score_threshold,scale_score,return_embedding,group_by, andgroup_size.
Connections
QdrantSparseEmbeddingRetriever receives sparse query embeddings from a sparse text embedder. It sends retrieved documents to downstream components such as PromptBuilder or a ranker.
Source Code
To check this component's source code, open retriever.py in the Haystack Core Integrations repository.
Usage Examples
Basic Configuration
QdrantSparseEmbeddingRetriever:
type: qdrant.src.haystack_integrations.components.retrievers.qdrant.retriever.QdrantSparseEmbeddingRetriever
init_parameters: {}
components:
QdrantSparseEmbeddingRetriever:
type: qdrant.src.haystack_integrations.components.retrievers.qdrant.retriever.QdrantSparseEmbeddingRetriever
init_parameters:
Parameters
Inputs
| Parameter | Type | Description |
|---|---|---|
query_sparse_embedding | SparseEmbedding | Sparse Embedding of the query. |
filters | Optional[Union[Dict[str, Any], models.Filter]] | Filters applied to the retrieved Documents. The way runtime filters are applied depends on the filter_policy chosen at retriever initialization. |
top_k | Optional[int] | The maximum number of documents to return. If using group_by parameters, maximum number of groups to return. |
scale_score | Optional[bool] | Whether to scale the scores of the retrieved documents or not. |
return_embedding | Optional[bool] | Whether to return the embedding of the retrieved Documents. |
score_threshold | Optional[float] | A minimal score threshold for the result. Score of the returned result might be higher or smaller than the threshold depending on the Distance function used. |
group_by | Optional[str] | Payload field to group by, must be a string or number field. If the field contains more than one value, all values will be used for grouping. One point can be in multiple groups. |
group_size | Optional[int] | Maximum amount of points to return per group. Default is 3. |
Outputs
| Parameter | Type | Description |
|---|---|---|
documents | List[Document] | The retrieved documents. |
Init Parameters
These are the parameters you can configure in Pipeline Builder:
| Parameter | Type | Default | Description |
|---|---|---|---|
| document_store | QdrantDocumentStore | An instance of QdrantDocumentStore. | |
| filters | Optional[Union[Dict[str, Any], models.Filter]] | None | A dictionary with filters to narrow down the search space. |
| top_k | int | 10 | The maximum number of documents to retrieve. If using group_by parameters, maximum number of groups to return. |
| scale_score | bool | False | Whether to scale the scores of the retrieved documents or not. |
| return_embedding | bool | False | Whether to return the sparse embedding of the retrieved Documents. |
| filter_policy | Union[str, FilterPolicy] | FilterPolicy.REPLACE | Policy to determine how filters are applied. Defaults to "replace". |
| score_threshold | Optional[float] | None | A minimal score threshold for the result. Score of the returned result might be higher or smaller than the threshold depending on the Distance function used. For example, for cosine similarity only higher scores will be returned. |
| group_by | Optional[str] | None | Payload field to group by, must be a string or number field. If the field contains more than one value, all values will be used for grouping. One point can be in multiple groups. |
| group_size | Optional[int] | None | Maximum amount of points to return per group. Default is 3. |
Run Method Parameters
These are the parameters you can configure for the component's run() method. This means you can pass these parameters at query time through the API, in Playground, or when running a job. For details, see Modify Pipeline Parameters at Query Time.
| Parameter | Type | Default | Description |
|---|---|---|---|
| query_sparse_embedding | SparseEmbedding | Sparse Embedding of the query. | |
| filters | Optional[Union[Dict[str, Any], models.Filter]] | None | Filters applied to the retrieved Documents. The way runtime filters are applied depends on the filter_policy chosen at retriever initialization. See init method docstring for more details. |
| top_k | Optional[int] | None | The maximum number of documents to return. If using group_by parameters, maximum number of groups to return. |
| scale_score | Optional[bool] | None | Whether to scale the scores of the retrieved documents or not. |
| return_embedding | Optional[bool] | None | Whether to return the embedding of the retrieved Documents. |
| score_threshold | Optional[float] | None | A minimal score threshold for the result. Score of the returned result might be higher or smaller than the threshold depending on the Distance function used. For example, for cosine similarity only higher scores will be returned. |
| group_by | Optional[str] | None | Payload field to group by, must be a string or number field. If the field contains more than one value, all values will be used for grouping. One point can be in multiple groups. |
| group_size | Optional[int] | None | Maximum amount of points to return per group. Default is 3. |
Was this page helpful?