MxbaiV2Ranker
Ranks documents by their semantic similarity to the query using MxbaiRerank models.
Key Features
- Ranks documents using MxbaiRerank models running on deepset AI Platform's infrastructure.
- No external API key required — models run locally within the platform.
- Configurable
top_kto return only the most relevant documents. - Supports metadata fields in the ranking process for richer context.
- Configurable score threshold to filter out low-relevance results.
- Adjustable batch size for performance optimization.
Configuration
- Drag the
MxbaiV2Rankercomponent onto the canvas from the Component Library. - Click the component to open the configuration panel.
- On the General tab:
- Enter the name of the MxbaiRerank model to use, such as
mixedbread-ai/mxbai-rerank-base-v2.
- Enter the name of the MxbaiRerank model to use, such as
- Go to the Advanced tab to configure
top_k, embedding dimensions, and the API key if using an external endpoint.
Connections
MxbaiV2Ranker accepts a query string, a documents list, and optional top_k and score_threshold values as inputs. It outputs the re-ranked documents list sorted by relevance.
Connect a Retriever or DocumentJoiner to its documents input. Connect its documents output to PromptBuilder or use it as the pipeline's final output.
Usage Example
This is an example of a document search pipeline with hybrid retrieval, where the Ranker receives documents from both retrievers via DocumentJoiner and outputs the ranked documents as the final result.
components:
bm25_retriever:
type: haystack_integrations.components.retrievers.opensearch.bm25_retriever.OpenSearchBM25Retriever
init_parameters:
document_store:
type: haystack_integrations.document_stores.opensearch.document_store.OpenSearchDocumentStore
init_parameters:
use_ssl: true
verify_certs: false
hosts:
- ${OPENSEARCH_HOST}
http_auth:
- ${OPENSEARCH_USER}
- ${OPENSEARCH_PASSWORD}
embedding_dim: 768
similarity: cosine
index: ''
max_chunk_bytes: 104857600
return_embedding: false
method:
mappings:
settings:
create_index: true
timeout:
top_k: 20
embedding_retriever:
type: haystack_integrations.components.retrievers.opensearch.embedding_retriever.OpenSearchEmbeddingRetriever
init_parameters:
document_store:
type: haystack_integrations.document_stores.opensearch.document_store.OpenSearchDocumentStore
init_parameters:
use_ssl: true
verify_certs: false
hosts:
- ${OPENSEARCH_HOST}
http_auth:
- ${OPENSEARCH_USER}
- ${OPENSEARCH_PASSWORD}
embedding_dim: 768
similarity: cosine
index: ''
max_chunk_bytes: 104857600
return_embedding: false
method:
mappings:
settings:
create_index: true
timeout:
top_k: 20
document_joiner:
type: haystack.components.joiners.document_joiner.DocumentJoiner
init_parameters:
join_mode: concatenate
DeepsetNvidiaTextEmbedder:
type: deepset_cloud_custom_nodes.embedders.nvidia.text_embedder.DeepsetNvidiaTextEmbedder
init_parameters:
model: intfloat/multilingual-e5-base
prefix: ''
suffix: ''
truncate:
normalize_embeddings: false
timeout:
backend_kwargs:
MxbaiV2Ranker:
type: deepset_cloud_custom_nodes.rankers.mxbai.mxbaiv2_ranker.MxbaiV2Ranker
init_parameters:
model: mixedbread-ai/mxbai-rerank-base-v2
top_k: 10
max_length: 8192
meta_fields_to_embed:
embedding_separator: \n
score_threshold:
disable_transformers_warnings: false
model_kwargs:
tokenizer_kwargs:
batch_size: 16
connections:
- sender: bm25_retriever.documents
receiver: document_joiner.documents
- sender: embedding_retriever.documents
receiver: document_joiner.documents
- sender: DeepsetNvidiaTextEmbedder.embedding
receiver: embedding_retriever.query_embedding
- sender: document_joiner.documents
receiver: MxbaiV2Ranker.documents
max_runs_per_component: 100
metadata: {}
inputs:
query:
- bm25_retriever.query
- DeepsetNvidiaTextEmbedder.text
- MxbaiV2Ranker.query
filters:
- bm25_retriever.filters
- embedding_retriever.filters
outputs:
documents: MxbaiV2Ranker.documents
Parameters
Inputs
| Parameter | Type | Default | Description |
|---|---|---|---|
| query | str | The query used for ranking documents by their similarity to the query. | |
| documents | list[Document] | The documents to be ranked. | |
| top_k | Optional[int] | None | The maximum number of documents to return. |
| score_threshold | Optional[float] | None | Returns only documents with the score above this threshold. |
Outputs
| Parameter | Type | Default | Description |
|---|---|---|---|
| documents | list[Document] | The ranked documents. |
Init Parameters
These are the parameters you can configure in Pipeline Builder:
| Parameter | Type | Default | Description |
|---|---|---|---|
| model | str | mixedbread-ai/mxbai-rerank-base-v2 | The name of path to the model used for ranking. |
| top_k | int | 10 | The maximum number of documents to return. |
| max_length | int | 8192 | The maximum length of the input sequence. |
| meta_fields_to_embed | List[str] | None | The list of metadata fields to include in the document embeddings. |
| embedding_separator | str | \n | The separator to use between metadata fields and document content. |
| score_threshold | float | None | The minimum score for documents to be included in the results. |
| disable_transformers_warnings | bool | False | Whether to disable transformers warnings. |
| model_kwargs | dict[str, Any] | None | Additional keyword arguments to pass to the model. |
| tokenizer_kwargs | dict[str, Any] | None | Additional keyword arguments to pass to the tokenizer. |
| batch_size | int | 16 | Batch size for processing documents. |
Run Method Parameters
These are the parameters you can configure for the component's run() method. This means you can pass these parameters at query time through the API, in Playground, or when running a job. For details, see Modify Pipeline Parameters at Query Time.
| Parameter | Type | Default | Description |
|---|---|---|---|
| query | str | The query used for ranking documents by their similarity to the query. | |
| documents | list[Document] | The documents to be ranked. | |
| top_k | Optional[int] | None | The maximum number of documents to return. |
| score_threshold | Optional[float] | None | The minimum score for documents to be included in the results. |
Was this page helpful?