WeaviateBM25Retriever
Retrieve documents from a Weaviate document store using the BM25 algorithm for keyword-based search.
Key Features
- Keyword-based BM25 retrieval from a Weaviate vector database.
- Configurable number of results with
top_k. - Supports metadata filtering to narrow down the search space.
- Configurable filter policy (
replaceormerge) for runtime filters.
Configuration
- Drag the
WeaviateBM25Retrievercomponent onto the canvas from the Component Library. - Click on the component to open the configuration panel.
- On the General tab:
- Configure the
WeaviateDocumentStorewith your Weaviate instance URL. - Set
top_kto control the maximum number of documents to retrieve.
- Configure the
- Go to the Advanced tab to configure
filter_policyand default filters.
Connections
WeaviateBM25Retriever receives a query text string as input. It sends retrieved documents to downstream components such as PromptBuilder or a ranker.
Source Code
To check this component's source code, open bm25_retriever.py in the Haystack Core Integrations repository.
Usage Examples
Basic Configuration
WeaviateBM25Retriever:
type: weaviate.src.haystack_integrations.components.retrievers.weaviate.bm25_retriever.WeaviateBM25Retriever
init_parameters: {}
Connect the pipeline's query input to its query input. Connect its documents output to a PromptBuilder, Ranker, or answer builder.
components:
WeaviateBM25Retriever:
type: weaviate.src.haystack_integrations.components.retrievers.weaviate.bm25_retriever.WeaviateBM25Retriever
init_parameters:
Parameters
Inputs
| Parameter | Type | Description |
|---|---|---|
query | str | The query text. |
filters | Optional[Dict[str, Any]] | Filters applied to the retrieved Documents. The way runtime filters are applied depends on the filter_policy chosen at retriever initialization. |
top_k | Optional[int] | The maximum number of documents to return. |
Outputs
| Parameter | Type | Description |
|---|---|---|
documents | List[Document] | Retrieved documents. |
Init Parameters
These are the parameters you can configure in Pipeline Builder:
| Parameter | Type | Default | Description |
|---|---|---|---|
| document_store | WeaviateDocumentStore | Instance of WeaviateDocumentStore that will be used from this retriever. | |
| filters | Optional[Dict[str, Any]] | None | Custom filters applied when running the retriever. |
| top_k | int | 10 | Maximum number of documents to return. |
| filter_policy | Union[str, FilterPolicy] | FilterPolicy.REPLACE | Policy to determine how filters are applied. |
Run Method Parameters
These are the parameters you can configure for the component's run() method. This means you can pass these parameters at query time through the API, in Playground, or when running a job. For details, see Modify Pipeline Parameters at Query Time.
| Parameter | Type | Default | Description |
|---|---|---|---|
| query | str | The query text. | |
| filters | Optional[Dict[str, Any]] | None | Filters applied to the retrieved Documents. The way runtime filters are applied depends on the filter_policy chosen at retriever initialization. See init method docstring for more details. |
| top_k | Optional[int] | None | The maximum number of documents to return. |
Was this page helpful?