Skip to main content
For the complete documentation index for agents and LLMs, see llms.txt.

DeepsetNvidiaNIMRanker

Rank documents by their relevance to the query using NVIDIA NIM models on optimized hardware.

Key Features

  • Ranks documents by similarity to the query using NVIDIA NIM models running on hardware optimized for performance by deepset.
  • Unlike models hosted on platforms like Hugging Face, these models are not downloaded at query time — you choose a model upfront on the component card.
  • The optimized models are only available on Haystack Enterprise Platform. To run this component on your own hardware, use TransformersSimilarityRanker instead.
  • Configurable score scaling, calibration factor, and score threshold.
  • Supports truncation of long inputs.

Configuration

  1. Drag the DeepsetNvidiaNIMRanker component onto the canvas from the Component Library.
  2. Click on the component to open the configuration panel.
  3. On the General tab:
    1. Select the NVIDIA NIM ranking model from the list on the component card.
    2. Set top_k to control the number of documents returned.
  4. Go to the Advanced tab to configure query_prefix, document_prefix, batch_size, score_threshold, meta_fields_to_embed, embedding_separator, scale_score, calibration_factor, timeout, truncate, and backend_kwargs.

Connections

DeepsetNvidiaNIMRanker accepts a query string through its query input and a list of documents through its documents input. It outputs ranked documents through its documents output, sorted from most to least relevant.

Connect a Retriever's (or DocumentJoiner's) documents output to DeepsetNvidiaNIMRanker's documents input. Then connect its documents output to PromptBuilder or use it as the pipeline's final output.

Usage Examples

Basic Configuration

  DeepsetNvidiaNIMRanker:
type: deepset_cloud_custom_nodes.rankers.nvidia.nim_ranker.DeepsetNvidiaNIMRanker
init_parameters:
model: nvidia/llama-3.2-nv-rerankqa-1b-v2
query_prefix: ''
document_prefix: ''
top_k: 10
batch_size: 40
embedding_separator: \n
scale_score: true
calibration_factor: 1

In this example, the Ranker receives joined documents from a keyword and a semantic retriever and returns the ranked documents as pipeline output:

components:
bm25_retriever:
type: haystack_integrations.components.retrievers.opensearch.bm25_retriever.OpenSearchBM25Retriever
init_parameters:
document_store:
type: haystack_integrations.document_stores.opensearch.document_store.OpenSearchDocumentStore
init_parameters:
use_ssl: true
verify_certs: false
hosts:
- ${OPENSEARCH_HOST}
http_auth:
- ${OPENSEARCH_USER}
- ${OPENSEARCH_PASSWORD}
embedding_dim: 768
similarity: cosine
index: ''
max_chunk_bytes: 104857600
return_embedding: false
method:
mappings:
settings:
create_index: true
timeout:
top_k: 20
embedding_retriever:
type: haystack_integrations.components.retrievers.opensearch.embedding_retriever.OpenSearchEmbeddingRetriever
init_parameters:
document_store:
type: haystack_integrations.document_stores.opensearch.document_store.OpenSearchDocumentStore
init_parameters:
use_ssl: true
verify_certs: false
hosts:
- ${OPENSEARCH_HOST}
http_auth:
- ${OPENSEARCH_USER}
- ${OPENSEARCH_PASSWORD}
embedding_dim: 768
similarity: cosine
index: ''
max_chunk_bytes: 104857600
return_embedding: false
method:
mappings:
settings:
create_index: true
timeout:
top_k: 20
document_joiner:
type: haystack.components.joiners.document_joiner.DocumentJoiner
init_parameters:
join_mode: concatenate
DeepsetNvidiaTextEmbedder:
type: deepset_cloud_custom_nodes.embedders.nvidia.text_embedder.DeepsetNvidiaTextEmbedder
init_parameters:
model: intfloat/multilingual-e5-base
prefix: ''
suffix: ''
truncate:
normalize_embeddings: false
timeout:
backend_kwargs:
DeepsetNvidiaNIMRanker:
type: deepset_cloud_custom_nodes.rankers.nvidia.nim_ranker.DeepsetNvidiaNIMRanker
init_parameters:
model: nvidia/llama-3.2-nv-rerankqa-1b-v2
query_prefix: ''
document_prefix: ''
top_k: 10
batch_size: 40
score_threshold:
meta_fields_to_embed:
embedding_separator: \n
scale_score: true
calibration_factor: 1
timeout:
truncate:
backend_kwargs:

connections:
- sender: bm25_retriever.documents
receiver: document_joiner.documents
- sender: embedding_retriever.documents
receiver: document_joiner.documents
- sender: DeepsetNvidiaTextEmbedder.embedding
receiver: embedding_retriever.query_embedding
- sender: document_joiner.documents
receiver: DeepsetNvidiaNIMRanker.documents

max_runs_per_component: 100

metadata: {}

inputs:
query:
- bm25_retriever.query
- DeepsetNvidiaTextEmbedder.text
- DeepsetNvidiaNIMRanker.query
filters:
- bm25_retriever.filters
- embedding_retriever.filters

outputs:
documents: DeepsetNvidiaNIMRanker.documents

Parameters

Inputs

ParameterTypeDescription
querystrThe input query to compare the documents to.
documentsList[Document]A list of documents to be ranked.
top_kint | NoneThe maximum number of documents to return.
scale_scorebool | NoneIf True, scales the raw logit predictions using a Sigmoid activation function. If False, disables scaling of the raw logit predictions.
calibration_factorfloat | NoneUse this factor to calibrate probabilities with sigmoid(logits * calibration_factor). Used only if scale_score is True.
score_thresholdfloat | NoneThe minimum score for documents to be included in the result.

Outputs

ParameterTypeDescription
documentsList[Document]Documents closest to the query, sorted from most similar to least similar.

Init Parameters

These are the parameters you can configure in Pipeline Builder:

ParameterTypeDefaultDescription
modelDeepsetNvidiaNIMRankingModelsDeepsetNvidiaNIMRankingModels.NVIDIA_LLAMA_3_2_NV_RERANKQA_1B_V2The model to use for ranking.
query_prefixstrString to prepend to queries.
document_prefixstrString to prepend to documents.
top_kint10Maximum number of documents to return.
batch_sizeint40The number of documents to rank at once.
score_thresholdfloat | NoneNoneMinimum score threshold for returned documents to be included in the results.
meta_fields_to_embedList[str] | NoneNoneList of metadata fields to include in embedding.
embedding_separatorstr\nSeparator for concatenating metadata fields.
scale_scoreboolTrueWhether to scale the scores using a sigmoid function.
calibration_factorfloat | None1.0Factor to calibrate probabilities when scaling scores.
timeoutfloat | NoneNoneTimeout in seconds for the Triton server requests.
truncateOptional[EmbeddingTruncateMode]NoneSpecifies how to truncate inputs longer than the maximum token length. Possible options are: START, END, NONE. If set to START, the input is truncated from the start. If set to END, the input is truncated from the end. If set to NONE, returns an error if the input is too long.
backend_kwargsDict[str, Any] | NoneNoneAdditional keyword arguments to pass to the backend.

Run Method Parameters

These are the parameters you can configure for the component's run() method. This means you can pass these parameters at query time through the API, in Playground, or when running a job. For details, see Modify Pipeline Parameters at Query Time.

ParameterTypeDefaultDescription
querystrThe input query to compare the documents to.
documentsList[Document]A list of documents to be ranked.
top_kint | NoneNoneThe maximum number of documents to return.
scale_scorebool | NoneNoneIf True, scales the raw logit predictions using a Sigmoid activation function. If False, disables scaling of the raw logit predictions.
calibration_factorfloat | NoneNoneUse this factor to calibrate probabilities with sigmoid(logits * calibration_factor). Used only if scale_score is True.
score_thresholdfloat | NoneNoneUse it to return documents only with a score above this threshold.