MxbaiV2Ranker

Rank documents based on their semantic similarity to the query using MxbaiRerank models.

Basic Information

  • Type: deepset_cloud_custom_nodes.rankers.mxbai.mxbaiv2_ranker.MxbaiV2Ranker
  • Components it can connect with:
    • Retrievers: MxbaiV2Ranker can receive documents from a Retriever and rank them.
    • PromptBuilder: MxbaiV2Ranker can send the ranked documents to PromptBuilder which adds them to the prompt for the LLM.
    • Any component that outputs a list of documents or accepts a list of documents as input.

Inputs

ParameterTypeDefaultDescription
querystrThe query used for ranking documents by their similarity to the query.
documentslist[Document]The documents to be ranked.
top_kOptional[int]NoneThe maximum number of documents to return.
score_thresholdOptional[float]NoneReturns only documents with the score above this threshold.

Outputs

ParameterTypeDefaultDescription
documentslist[Document]The ranked documents.

Overview

MxbaiV2Ranker uses MxbaiRerank models to ranks documents based on their semantic similarity to the query. It assigns similarity scores to each document.

Usage Example

Initializing the Component

components:
  MxbaiV2Ranker:
    type: rankers.mxbai.mxbaiv2_ranker.MxbaiV2Ranker
    init_parameters:

Using the Component in a Pipeline

This is an example of a document search pipeline with hybrid retrieval, where the Ranker receives documents from both retrievers from DocumentJoiner and outputs the ranked documents as the final output of the pipeline.

components:
  bm25_retriever:
    type: haystack_integrations.components.retrievers.opensearch.bm25_retriever.OpenSearchBM25Retriever
    init_parameters:
      document_store:
        type: haystack_integrations.document_stores.opensearch.document_store.OpenSearchDocumentStore
        init_parameters:
          use_ssl: true
          verify_certs: false
          hosts:
          - ${OPENSEARCH_HOST}
          http_auth:
          - ${OPENSEARCH_USER}
          - ${OPENSEARCH_PASSWORD}
          embedding_dim: 768
          similarity: cosine
          index: ''
          max_chunk_bytes: 104857600
          return_embedding: false
          method:
          mappings:
          settings:
          create_index: true
          timeout:
      top_k: 20
  embedding_retriever:
    type: haystack_integrations.components.retrievers.opensearch.embedding_retriever.OpenSearchEmbeddingRetriever
    init_parameters:
      document_store:
        type: haystack_integrations.document_stores.opensearch.document_store.OpenSearchDocumentStore
        init_parameters:
          use_ssl: true
          verify_certs: false
          hosts:
          - ${OPENSEARCH_HOST}
          http_auth:
          - ${OPENSEARCH_USER}
          - ${OPENSEARCH_PASSWORD}
          embedding_dim: 768
          similarity: cosine
          index: ''
          max_chunk_bytes: 104857600
          return_embedding: false
          method:
          mappings:
          settings:
          create_index: true
          timeout:
      top_k: 20
  document_joiner:
    type: haystack.components.joiners.document_joiner.DocumentJoiner
    init_parameters:
      join_mode: concatenate
  DeepsetNvidiaTextEmbedder:
    type: deepset_cloud_custom_nodes.embedders.nvidia.text_embedder.DeepsetNvidiaTextEmbedder
    init_parameters:
      model: intfloat/multilingual-e5-base
      prefix: ''
      suffix: ''
      truncate:
      normalize_embeddings: false
      timeout:
      backend_kwargs:
  MxbaiV2Ranker:
    type: deepset_cloud_custom_nodes.rankers.mxbai.mxbaiv2_ranker.MxbaiV2Ranker
    init_parameters:
      model: mixedbread-ai/mxbai-rerank-base-v2
      top_k: 10
      max_length: 8192
      meta_fields_to_embed:
      embedding_separator: \n
      score_threshold:
      disable_transformers_warnings: false
      model_kwargs:
      tokenizer_kwargs:
      batch_size: 16

connections:
- sender: bm25_retriever.documents
  receiver: document_joiner.documents
- sender: embedding_retriever.documents
  receiver: document_joiner.documents
- sender: DeepsetNvidiaTextEmbedder.embedding
  receiver: embedding_retriever.query_embedding
- sender: document_joiner.documents
  receiver: MxbaiV2Ranker.documents

max_runs_per_component: 100

metadata: {}

inputs:
  query:
  - bm25_retriever.query
  - DeepsetNvidiaTextEmbedder.text
  - MxbaiV2Ranker.query
  filters:
  - bm25_retriever.filters
  - embedding_retriever.filters

outputs:
  documents: MxbaiV2Ranker.documents

Parameters

Init Parameters

These are the parameters you can configure in Pipeline Builder:

ParameterTypeDefaultDescription
modelstrmixedbread-ai/mxbai-rerank-base-v2The name of path to the model used for ranking.
top_kint10The maximum number of documents to return.
max_lengthint8192The maximum length of the input sequence.
meta_fields_to_embedList[str]NoneThe list of metadata fields to include in the document embeddings.
embedding_separatorstr\nThe separator to use between metadata fields and document content.
score_thresholdfloatNoneThe minimum score for documents to be included in the results.
disable_transofmers_warningsboolFalseWhether to disable transformers warnings.
model_kwargsdict[str, Any]NoneAdditional keyword arguments to pass to the model.
tokenizer_kwargsdict[str, Any]NoneAdditional keyword arguments to pass to the tokenizer.
batch_sizeint16Batch size for processing documents.

Run Method Parameters

These are the parameters you can configure for the component's run() method. This means you can pass these parameters at query time through the API, in Playground, or when running a job. For details, see Modify Pipeline Parameters at Query Time.

ParameterTypeDefaultDescription
querystrThe query used for ranking documents by their similarity to the query.
documentslist[Document]The documents to be ranked.
top_kOptional[int]NoneThe maximum number of documents to return.
score_thresholdOptional[float]NoneThe minimum score for documents to be included in the results.