Skip to main content

DeepsetCloudDocumentStoreBM25Retriever

Retrieves documents from the deepset AI Platform using keyword search through the deepset Query API.

Basic Information

  • Type: dc_custom_component.components.retrievers.deepsetcloud_bm25.DeepsetCloudDocumentStoreBM25Retriever
  • Components it can connect with:
    • Query: The Retriever receives the user query and searches for documents based on it.
    • Rankers: The Retriever can send the retrieved documents to a ranker.

Inputs

ParameterTypeDefaultDescription
querystrThe user query.
all_terms_must_matchOptional [bool]NoneSpecifies whether all terms in the query must match.
custom_queryOptional [dict[str, Any]]NoneA custom query to use for retrieval.
filtersOptional [dict[str, Any]]NoneFilters to narrow down the search.
top_kOptional [int]10The maximum number of documents to retrieve.

Outputs

ParameterTypeDefaultDescription
documentsList[Document]The documents matching the query.

Overview

DeepsetCloudDocumentStoreBM25Retriever queries documents stored in deepset AI Platform. It sends a query to the deepset API and retrieves the most relevant documents based on the specified parameters. For details, see Query Documents endpoint.

DeepsetCloudDocumentStoreBM25Retriever works with DeepsetCloudDocumentStore. You can use it for example to query production data with pipelines that are in a different workspace.

Usage Example

Initializing the Component

components:
DeepsetCloudDocumentStoreBM25Retriever:
type: retrievers.deepsetcloud_bm25.DeepsetCloudDocumentStoreBM25Retriever
init_parameters:

Using the Component in a Pipeline

This is an example of a document search pipeline that uses both DeepsetCloudDocumentStoreBM25Retriever and DeepsetCloudDocumentStoreEmbeddingRetriever to retrieve documents from the workspace called generative using a pipeline called test.

components:
bm25_retriever:
type: deepset_cloud_custom_nodes.retrievers.deepsetcloud_bm25.DeepsetCloudDocumentStoreBM25Retriever
init_parameters:
document_store:
type: dc_custom_component.components.document_stores.deepsetcloud.DeepsetCloudDocumentStore
init_parameters:
workspace_name: generative-q4-24
pipeline_name: v2_genjus-chat-dCv2
dc_api_key:
type: env_var
env_vars:
- MANZ_GEN_TOKEN_2
strict: false
timeout: 10
top_k: 30
query_embedder:
type: haystack.components.embedders.sentence_transformers_text_embedder.SentenceTransformersTextEmbedder
init_parameters:
model: deepset-manz/e5-multilingual-manz-v1
progress_bar: false
embedding_retriever:
type: deepset_cloud_custom_nodes.retrievers.deepsetcloud_embedding.DeepsetCloudDocumentStoreEmbeddingRetriever
init_parameters:
document_store:
type: dc_custom_component.components.document_stores.deepsetcloud.DeepsetCloudDocumentStore
init_parameters:
workspace_name: generative-q4-24
pipeline_name: v2_genjus-chat-dCv2
dc_api_key:
type: env_var
env_vars:
- MANZ_GEN_TOKEN_2
strict: false
timeout: 10
top_k: 30
document_joiner:
type: haystack.components.joiners.document_joiner.DocumentJoiner
init_parameters:
join_mode: concatenate
top_k: 60
sort_by_score: false
ranker:
type: haystack.components.rankers.transformers_similarity.TransformersSimilarityRanker
init_parameters:
model: svalabs/cross-electra-ms-marco-german-uncased
top_k: 15
model_kwargs:
torch_dtype: torch.float16
DeepsetMetadataGrouper:
type: haystack.components.rankers.meta_field_grouping_ranker.MetaFieldGroupingRanker
init_parameters:
group_by: dokid
subgroup_by:
sort_docs_by: tokennr

connections:
- sender: bm25_retriever.documents
receiver: document_joiner.documents
- sender: query_embedder.embedding
receiver: embedding_retriever.query_embedding
- sender: embedding_retriever.documents
receiver: document_joiner.documents
- sender: document_joiner.documents
receiver: ranker.documents
- sender: ranker.documents
receiver: DeepsetMetadataGrouper.documents

max_runs_per_component: 100

metadata: {}

inputs:
query:
- query_embedder.text
- bm25_retriever.query
- ranker.query
filters:
- bm25_retriever.filters
- embedding_retriever.filters

outputs:
documents: DeepsetMetadataGrouper.documents

Parameters

Init Parameters

These are the parameters you can configure in Pipeline Builder:

ParameterTypeDefaultDescription
document_storeDeepsetCloudDocumentStoreThe DeepsetCloudDocumentStore document store instance to use for retrieving documents.
all_terms_must_matchboolNoneWhether all terms in the query must match.
top_kint10The maximum number of documents to retrieve.

Run Method Parameters

These are the parameters you can configure for the component's run() method. This means you can pass these parameters at query time through the API, in Playground, or when running a job. For details, see Modify Pipeline Parameters at Query Time.

ParameterTypeDefaultDescription
querystrThe user query to use for retrieving documents.
all_terms_must_matchboolNoneSpecifies whether all terms in the query must match the retrieved documents.
custom_querydict[str, Any]NoneA custom query for retrieval.
filtersDict[str, Any]NoneFilters to narrow down the search.
top_kint10The maximum number of documents to fetch.