DeepsetCloudDocumentStoreEmbeddingRetriever
Retrieves documents from the deepset AI Platform based on their semantic similarity to the query, using the deepset Query API.
Basic Information
- Type:
dc_custom_component.components.retrievers.deepsetcloud_embedding.DeepsetCloudDocumentStoreEmbeddingRetriever - Components it can connect with:
- Embedders: The Retriever receives query embedding from a Text Embedder.
- Rankers: The Retriever can send the retrieved documents to a Ranker.
Inputs
| Parameter | Type | Default | Description |
|---|---|---|---|
| query_embedding | List[float] | Vector representation of the user query. | |
| filters | Optional [Dict[str, Any]] | None | Filters to narrow down the search. |
| top_k | Optional [int] | 10 | The maximum number of documents to retrieve. |
Outputs
| Parameter | Type | Default | Description |
|---|---|---|---|
| documents | List[Document] | The retrieved documents that match the query. |
Overview
DeepsetCloudDocumentStoreEmbeddingRetriever queries documents stored in deepset AI Platform. It sends a query
to the deepset API and retrieves the most relevant documents based on their semantic similarity to the query.
For details, see Query Documents endpoint.
DeepsetCloudDocumentStoreEmbeddingRetriever works with DeepsetCloudDocumentStore. You can use it for example to query production data with pipelines that are in a different workspace.
Usage Example
Initializing the Component
components:
DeepsetCloudDocumentStoreEmbeddingRetriever:
type: retrievers.deepsetcloud_embedding.DeepsetCloudDocumentStoreEmbeddingRetriever
init_parameters:
Using the Component in a Pipeline
This is an example of a document search pipeline that uses both DeepsetCloudDocumentStoreBM25Retriever and DeepsetCloudDocumentStoreEmbeddingRetriever to retrieve documents from the workspace called generative using a pipeline called test.
components:
bm25_retriever:
type: deepset_cloud_custom_nodes.retrievers.deepsetcloud_bm25.DeepsetCloudDocumentStoreBM25Retriever
init_parameters:
document_store:
type: dc_custom_component.components.document_stores.deepsetcloud.DeepsetCloudDocumentStore
init_parameters:
workspace_name: generative-q4-24
pipeline_name: v2_genjus-chat-dCv2
dc_api_key:
type: env_var
env_vars:
- MANZ_GEN_TOKEN_2
strict: false
timeout: 10
top_k: 30
query_embedder:
type: haystack.components.embedders.sentence_transformers_text_embedder.SentenceTransformersTextEmbedder
init_parameters:
model: deepset-manz/e5-multilingual-manz-v1
progress_bar: false
embedding_retriever:
type: deepset_cloud_custom_nodes.retrievers.deepsetcloud_embedding.DeepsetCloudDocumentStoreEmbeddingRetriever
init_parameters:
document_store:
type: dc_custom_component.components.document_stores.deepsetcloud.DeepsetCloudDocumentStore
init_parameters:
workspace_name: generative-q4-24
pipeline_name: v2_genjus-chat-dCv2
dc_api_key:
type: env_var
env_vars:
- MANZ_GEN_TOKEN_2
strict: false
timeout: 10
top_k: 30
document_joiner:
type: haystack.components.joiners.document_joiner.DocumentJoiner
init_parameters:
join_mode: concatenate
top_k: 60
sort_by_score: false
ranker:
type: haystack.components.rankers.transformers_similarity.TransformersSimilarityRanker
init_parameters:
model: svalabs/cross-electra-ms-marco-german-uncased
top_k: 15
model_kwargs:
torch_dtype: torch.float16
DeepsetMetadataGrouper:
type: haystack.components.rankers.meta_field_grouping_ranker.MetaFieldGroupingRanker
init_parameters:
group_by: dokid
subgroup_by:
sort_docs_by: tokennr
connections:
- sender: bm25_retriever.documents
receiver: document_joiner.documents
- sender: query_embedder.embedding
receiver: embedding_retriever.query_embedding
- sender: embedding_retriever.documents
receiver: document_joiner.documents
- sender: document_joiner.documents
receiver: ranker.documents
- sender: ranker.documents
receiver: DeepsetMetadataGrouper.documents
max_runs_per_component: 100
metadata: {}
inputs:
query:
- query_embedder.text
- bm25_retriever.query
- ranker.query
filters:
- bm25_retriever.filters
- embedding_retriever.filters
outputs:
documents: DeepsetMetadataGrouper.documents
Parameters
Init Parameters
These are the parameters you can configure in Pipeline Builder:
| Parameter | Type | Default | Description |
|---|---|---|---|
| document_store | DeepsetCloudDocumentStore | The document store instance to use for retrieving documents. | |
| top_k | int | 10 | The maximum number of top documents to retrieve. |
Run Method Parameters
These are the parameters you can configure for the component's run() method. This means you can pass these parameters at query time through the API, in Playground, or when running a job. For details, see Modify Pipeline Parameters at Query Time.
| Parameter | Type | Default | Description |
|---|---|---|---|
| query_embedding | List[float] | Vector representation of user query. | |
| filters | Dict[str, Any] | None | Filters to narrow down the search. |
| top_k | int | 10 | The maximum number of documents to retrieve. |
Was this page helpful?