Basic Information

Type: deepset_cloud_custom_nodes.rankers.nvidia.nim_ranker.DeepsetNvidiaNIMRanker
- Components it most often connects to:
- Retrievers: DeepsetNvidiaNIMRanker can receive documents from a Retriever and then rank them.
- PromptBuilder: DeepsetNvidiaNIMRanker can send the ranked documents to PromptBuilder, which adds them to the prompt for the LLM.
- Any component that outputs a list of documents or accepts a list of documents as input.

Inputs

Parameter	Type	Default	Description
query	str		The input query to compare the documents to.
documents	List[Document]		A list of documents to be ranked.
top_k	int \| None	None	The maximum number of documents to return.
scale_score	bool \| None	None	If `True`, scales the raw logit predictions using a Sigmoid activation function. If `False`, disables scaling of the raw logit predictions.
calibration_factor	float \| None	None	Use this factor to calibrate probabilities with `sigmoid(logits * calibration_factor)`. Used only if `scale_score` is `True`.
score_threshold	float \| None	None	The minimum score for documents to be included int he result.

Outputs

Parameter	Type	Default	Description
documents	List[Document]		Documents closest to the query, sorted from most similar to least similar.

Overview

DeepsetNvidiaNIMRanker uses NVIDIA NIM models to rank documents by their similarity to the query. It expects a model that takes query and document text as input and returns similarity scores.

This component runs on models provided by on hardware optimized for performance. Unlike models hosted on platforms like Hugging Face, these models are not downloaded at query time. Instead, you choose a model upfront on the component card.

The optimized models are only available on . To run this component on your own hardware, use TransformersSimiliarityRanker instead.

Usage Example

Initializing the Component

components:
  DeepsetNvidiaNIMRanker:
    type: rankers.nvidia.nim_ranker.DeepsetNvidiaNIMRanker
    init_parameters:

Using the Component in a Pipeline

In this example, the Ranker receives joined documents from a keyword and a semantic retriever and returns the ranked documents as pipeline output:

components:
  bm25_retriever:
    type: haystack_integrations.components.retrievers.opensearch.bm25_retriever.OpenSearchBM25Retriever
    init_parameters:
      document_store:
        type: haystack_integrations.document_stores.opensearch.document_store.OpenSearchDocumentStore
        init_parameters:
          use_ssl: true
          verify_certs: false
          hosts:
          - ${OPENSEARCH_HOST}
          http_auth:
          - ${OPENSEARCH_USER}
          - ${OPENSEARCH_PASSWORD}
          embedding_dim: 768
          similarity: cosine
          index: ''
          max_chunk_bytes: 104857600
          return_embedding: false
          method:
          mappings:
          settings:
          create_index: true
          timeout:
      top_k: 20
  embedding_retriever:
    type: haystack_integrations.components.retrievers.opensearch.embedding_retriever.OpenSearchEmbeddingRetriever
    init_parameters:
      document_store:
        type: haystack_integrations.document_stores.opensearch.document_store.OpenSearchDocumentStore
        init_parameters:
          use_ssl: true
          verify_certs: false
          hosts:
          - ${OPENSEARCH_HOST}
          http_auth:
          - ${OPENSEARCH_USER}
          - ${OPENSEARCH_PASSWORD}
          embedding_dim: 768
          similarity: cosine
          index: ''
          max_chunk_bytes: 104857600
          return_embedding: false
          method:
          mappings:
          settings:
          create_index: true
          timeout:
      top_k: 20
  document_joiner:
    type: haystack.components.joiners.document_joiner.DocumentJoiner
    init_parameters:
      join_mode: concatenate
  DeepsetNvidiaTextEmbedder:
    type: deepset_cloud_custom_nodes.embedders.nvidia.text_embedder.DeepsetNvidiaTextEmbedder
    init_parameters:
      model: intfloat/multilingual-e5-base
      prefix: ''
      suffix: ''
      truncate:
      normalize_embeddings: false
      timeout:
      backend_kwargs:
  DeepsetNvidiaNIMRanker:
    type: deepset_cloud_custom_nodes.rankers.nvidia.nim_ranker.DeepsetNvidiaNIMRanker
    init_parameters:
      model: nvidia/llama-3.2-nv-rerankqa-1b-v2
      query_prefix: ''
      document_prefix: ''
      top_k: 10
      batch_size: 40
      score_threshold:
      meta_fields_to_embed:
      embedding_separator: \n
      scale_score: true
      calibration_factor: 1
      timeout:
      truncate:
      backend_kwargs:

connections:
- sender: bm25_retriever.documents
  receiver: document_joiner.documents
- sender: embedding_retriever.documents
  receiver: document_joiner.documents
- sender: DeepsetNvidiaTextEmbedder.embedding
  receiver: embedding_retriever.query_embedding
- sender: document_joiner.documents
  receiver: DeepsetNvidiaNIMRanker.documents

max_runs_per_component: 100

metadata: {}

inputs:
  query:
  - bm25_retriever.query
  - DeepsetNvidiaTextEmbedder.text
  - DeepsetNvidiaNIMRanker.query
  filters:
  - bm25_retriever.filters
  - embedding_retriever.filters

outputs:
  documents: DeepsetNvidiaNIMRanker.documents

Parameters

Init Parameters

These are the parameters you can configure in Pipeline Builder:

Parameter	Type	Default	Description
model	DeepsetNvidiaNIMRankingModels	DeepsetNvidiaNIMRankingModels.NVIDIA_LLAMA_3_2_NV_RERANKQA_1B_V2	The model to use for ranking.
query_prefix	str		String to prepend to queries.
document_prefix	str		String to prepend to documents.
top_k	int	10	Maximum number of documents to return.
batch_size	int	40	The number of documents to rank at once.
score_threshold	float	None	None	Minimum score threshold for returned documents to be included in the results.
meta_fields_to_embed	List[str]	None	None	List of metadata fields to include in embedding.
embedding_separator	str	\n	Separator for concatenating metadata fields.
scale_score	bool	True	Whether to scale the scores using a sigmoid function.
calibration_factor	float	None	1.0	Factor to calibrate probabilities when scaling scores.
timeout	float	None	None	Timeout in seconds for the Triton server requests.
truncate	Optional[EmbeddingTruncateMode]	None	Specifies how to truncate inputs longer than the maximum token length. Possible options are: `START`, `END`, `NONE`. If set to `START`, the input is truncated from the start. If set to `END`, the input is truncated from the end. If set to `NONE`, returns an error if the input is too long.
backend_kwargs	Dict[str, Any]	None	None	Additional keyword arguments to pass to the backend.

Run Method Parameters

These are the parameters you can configure for the component's run() method. This means you can pass these parameters at query time through the API, in Playground, or when running a job. For details, see Modify Pipeline Parameters at Query Time.

Parameter	Type	Default	Description
query	str		The input query to compare the documents to.
documents	List[Document]		A list of documents to be ranked.
top_k	int	None	None	The maximum number of documents to return.
scale_score	bool	None	None	If `True`, scales the raw logit predictions using a Sigmoid activation function. If `False`, disables scaling of the raw logit predictions.
calibration_factor	float	None	None	Use this factor to calibrate probabilities with `sigmoid(logits * calibration_factor)`. Used only if `scale_score` is `True`.
score_threshold	float	None	None	Use it to return documents only with a score above this threshold.