Skip to main content

QdrantSparseEmbeddingRetriever

A component for retrieving documents from an QdrantDocumentStore using sparse vectors.

Basic Information

  • Type: haystack_integrations.components.retrievers.qdrant.retriever.QdrantSparseEmbeddingRetriever

Inputs

ParameterTypeDefaultDescription
query_sparse_embeddingSparseEmbeddingSparse Embedding of the query.
filtersOptional[Union[Dict[str, Any], models.Filter]]NoneFilters applied to the retrieved Documents. The way runtime filters are applied depends on the filter_policy chosen at retriever initialization. See init method docstring for more details.
top_kOptional[int]NoneThe maximum number of documents to return. If using group_by parameters, maximum number of groups to return.
scale_scoreOptional[bool]NoneWhether to scale the scores of the retrieved documents or not.
return_embeddingOptional[bool]NoneWhether to return the embedding of the retrieved Documents.
score_thresholdOptional[float]NoneA minimal score threshold for the result. Score of the returned result might be higher or smaller than the threshold depending on the Distance function used. E.g. for cosine similarity only higher scores will be returned.
group_byOptional[str]NonePayload field to group by, must be a string or number field. If the field contains more than 1 value, all values will be used for grouping. One point can be in multiple groups.
group_sizeOptional[int]NoneMaximum amount of points to return per group. Default is 3.

Outputs

ParameterTypeDefaultDescription
documentsList[Document]The retrieved documents.

Overview

Work in Progress

Bear with us while we're working on adding pipeline examples and most common components connections.

A component for retrieving documents from an QdrantDocumentStore using sparse vectors.

Usage example:

from haystack_integrations.components.retrievers.qdrant import QdrantSparseEmbeddingRetriever
from haystack_integrations.document_stores.qdrant import QdrantDocumentStore
from haystack.dataclasses import Document, SparseEmbedding

document_store = QdrantDocumentStore(
":memory:",
use_sparse_embeddings=True,
recreate_index=True,
return_embedding=True,
)

doc = Document(content="test", sparse_embedding=SparseEmbedding(indices=[0, 3, 5], values=[0.1, 0.5, 0.12]))
document_store.write_documents([doc])

retriever = QdrantSparseEmbeddingRetriever(document_store=document_store)
sparse_embedding = SparseEmbedding(indices=[0, 1, 2, 3], values=[0.1, 0.8, 0.05, 0.33])
retriever.run(query_sparse_embedding=sparse_embedding)

Usage Example

components:
QdrantSparseEmbeddingRetriever:
type: qdrant.src.haystack_integrations.components.retrievers.qdrant.retriever.QdrantSparseEmbeddingRetriever
init_parameters:

Parameters

Init Parameters

These are the parameters you can configure in Pipeline Builder:

ParameterTypeDefaultDescription
document_storeQdrantDocumentStoreAn instance of QdrantDocumentStore.
filtersOptional[Union[Dict[str, Any], models.Filter]]NoneA dictionary with filters to narrow down the search space.
top_kint10The maximum number of documents to retrieve. If using group_by parameters, maximum number of groups to return.
scale_scoreboolFalseWhether to scale the scores of the retrieved documents or not.
return_embeddingboolFalseWhether to return the sparse embedding of the retrieved Documents.
filter_policyUnion[str, FilterPolicy]FilterPolicy.REPLACEPolicy to determine how filters are applied. Defaults to "replace".
score_thresholdOptional[float]NoneA minimal score threshold for the result. Score of the returned result might be higher or smaller than the threshold depending on the Distance function used. E.g. for cosine similarity only higher scores will be returned.
group_byOptional[str]NonePayload field to group by, must be a string or number field. If the field contains more than 1 value, all values will be used for grouping. One point can be in multiple groups.
group_sizeOptional[int]NoneMaximum amount of points to return per group. Default is 3.

Run Method Parameters

These are the parameters you can configure for the component's run() method. This means you can pass these parameters at query time through the API, in Playground, or when running a job. For details, see Modify Pipeline Parameters at Query Time.

ParameterTypeDefaultDescription
query_sparse_embeddingSparseEmbeddingSparse Embedding of the query.
filtersOptional[Union[Dict[str, Any], models.Filter]]NoneFilters applied to the retrieved Documents. The way runtime filters are applied depends on the filter_policy chosen at retriever initialization. See init method docstring for more details.
top_kOptional[int]NoneThe maximum number of documents to return. If using group_by parameters, maximum number of groups to return.
scale_scoreOptional[bool]NoneWhether to scale the scores of the retrieved documents or not.
return_embeddingOptional[bool]NoneWhether to return the embedding of the retrieved Documents.
score_thresholdOptional[float]NoneA minimal score threshold for the result. Score of the returned result might be higher or smaller than the threshold depending on the Distance function used. E.g. for cosine similarity only higher scores will be returned.
group_byOptional[str]NonePayload field to group by, must be a string or number field. If the field contains more than 1 value, all values will be used for grouping. One point can be in multiple groups.
group_sizeOptional[int]NoneMaximum amount of points to return per group. Default is 3.