MxbaiV2Ranker

Ranks documents by their semantic similarity to the query using MxbaiRerank models.

Key Features

Ranks documents by semantic similarity to the query using MxbaiRerank models.
Assigns similarity scores to each document.
Configurable top_k to limit the number of returned documents.
Configurable score_threshold to filter out low-relevance documents.
Supports custom model and tokenizer keyword arguments.

Configuration

Drag the MxbaiV2Ranker component onto the canvas from the Component Library.
Click on the component to open the configuration panel.
On the General tab:
1. Select the MxbaiRerank model to use for ranking.
2. Set top_k to control the number of documents returned.
Go to the Advanced tab to configure max_length, meta_fields_to_embed, embedding_separator, score_threshold, batch_size, model_kwargs, tokenizer_kwargs, and disable_transformers_warnings.

Connections

MxbaiV2Ranker accepts a query string through its query input and a list of documents through its documents input. It outputs the ranked documents through its documents output, sorted from most to least relevant.

Connect a Retriever's documents output to MxbaiV2Ranker's documents input. Then connect MxbaiV2Ranker's documents output to PromptBuilder or use it as the pipeline's final output.

Usage Examples

Basic Configuration

  MxbaiV2Ranker:
    type: deepset_cloud_custom_nodes.rankers.mxbai.mxbaiv2_ranker.MxbaiV2Ranker
    init_parameters:
      model: mixedbread-ai/mxbai-rerank-base-v2
      top_k: 10
      max_length: 8192
      embedding_separator: \n
      disable_transformers_warnings: false
      batch_size: 16

Using the Component in a Pipeline

This is an example of a document search pipeline with hybrid retrieval, where the Ranker receives documents from both retrievers from DocumentJoiner and outputs the ranked documents as the final output of the pipeline.

# haystack-pipeline
components:
  bm25_retriever:
    type: haystack_integrations.components.retrievers.opensearch.bm25_retriever.OpenSearchBM25Retriever
    init_parameters:
      document_store:
        type: haystack_integrations.document_stores.opensearch.document_store.OpenSearchDocumentStore
        init_parameters:
          use_ssl: true
          verify_certs: false
          hosts:
          - ${OPENSEARCH_HOST}
          http_auth:
          - ${OPENSEARCH_USER}
          - ${OPENSEARCH_PASSWORD}
          embedding_dim: 768
          similarity: cosine
          index: ''
          max_chunk_bytes: 104857600
          return_embedding: false
          method:
          mappings:
          settings:
          create_index: true
          timeout:
      top_k: 20
  embedding_retriever:
    type: haystack_integrations.components.retrievers.opensearch.embedding_retriever.OpenSearchEmbeddingRetriever
    init_parameters:
      document_store:
        type: haystack_integrations.document_stores.opensearch.document_store.OpenSearchDocumentStore
        init_parameters:
          use_ssl: true
          verify_certs: false
          hosts:
          - ${OPENSEARCH_HOST}
          http_auth:
          - ${OPENSEARCH_USER}
          - ${OPENSEARCH_PASSWORD}
          embedding_dim: 768
          similarity: cosine
          index: ''
          max_chunk_bytes: 104857600
          return_embedding: false
          method:
          mappings:
          settings:
          create_index: true
          timeout:
      top_k: 20
  document_joiner:
    type: haystack.components.joiners.document_joiner.DocumentJoiner
    init_parameters:
      join_mode: concatenate
  DeepsetNvidiaTextEmbedder:
    type: deepset_cloud_custom_nodes.embedders.nvidia.text_embedder.DeepsetNvidiaTextEmbedder
    init_parameters:
      model: intfloat/multilingual-e5-base
      prefix: ''
      suffix: ''
      truncate:
      normalize_embeddings: false
      timeout:
      backend_kwargs:
  MxbaiV2Ranker:
    type: deepset_cloud_custom_nodes.rankers.mxbai.mxbaiv2_ranker.MxbaiV2Ranker
    init_parameters:
      model: mixedbread-ai/mxbai-rerank-base-v2
      top_k: 10
      max_length: 8192
      meta_fields_to_embed:
      embedding_separator: \n
      score_threshold:
      disable_transformers_warnings: false
      model_kwargs:
      tokenizer_kwargs:
      batch_size: 16

connections:
- sender: bm25_retriever.documents
  receiver: document_joiner.documents
- sender: embedding_retriever.documents
  receiver: document_joiner.documents
- sender: DeepsetNvidiaTextEmbedder.embedding
  receiver: embedding_retriever.query_embedding
- sender: document_joiner.documents
  receiver: MxbaiV2Ranker.documents

max_runs_per_component: 100

metadata: {}

inputs:
  query:
  - bm25_retriever.query
  - DeepsetNvidiaTextEmbedder.text
  - MxbaiV2Ranker.query
  filters:
  - bm25_retriever.filters
  - embedding_retriever.filters

outputs:
  documents: MxbaiV2Ranker.documents

Parameters

Inputs

Parameter	Type	Description
`query`	str	The query used for ranking documents by their similarity to the query.
`documents`	list[Document]	The documents to be ranked.
`top_k`	Optional[int]	The maximum number of documents to return.
`score_threshold`	Optional[float]	Returns only documents with the score above this threshold.

Outputs

Parameter	Type	Description
`documents`	list[Document]	The ranked documents.

Init Parameters

These are the parameters you can configure in Pipeline Builder:

Parameter	Type	Default	Description
model	str	`mixedbread-ai/mxbai-rerank-base-v2`	The name of path to the model used for ranking.
top_k	int	10	The maximum number of documents to return.
max_length	int	8192	The maximum length of the input sequence.
meta_fields_to_embed	List[str]	None	The list of metadata fields to include in the document embeddings.
embedding_separator	str	\n	The separator to use between metadata fields and document content.
score_threshold	float	None	The minimum score for documents to be included in the results.
disable_transformers_warnings	bool	False	Whether to disable transformers warnings.
model_kwargs	dict[str, Any]	None	Additional keyword arguments to pass to the model.
tokenizer_kwargs	dict[str, Any]	None	Additional keyword arguments to pass to the tokenizer.
batch_size	int	16	Batch size for processing documents.

Run Method Parameters

These are the parameters you can configure for the component's run() method. This means you can pass these parameters at query time through the API, in Playground, or when running a job. For details, see Modify Pipeline Parameters at Query Time.

Parameter	Type	Default	Description
query	str		The query used for ranking documents by their similarity to the query.
documents	list[Document]		The documents to be ranked.
top_k	Optional[int]	None	The maximum number of documents to return.
score_threshold	Optional[float]	None	The minimum score for documents to be included in the results.

Was this page helpful?

Key Features​

Configuration​

Connections​

Usage Examples​

Basic Configuration​

Using the Component in a Pipeline​

Parameters​

Inputs​

Outputs​

Init Parameters​

Run Method Parameters​