Skip to main content
For the complete documentation index for agents and LLMs, see llms.txt.

NvidiaRanker

Ranks documents by their semantic similarity to the query using NVIDIA NIM ranking models.

Key Features

  • Ranks documents by semantic similarity to the query using NVIDIA NIMs ranking models.
  • Documents are ordered from most to least semantically relevant to the query.
  • Default model: nvidia/nv-rerankqa-mistral-4b-v3.
  • Configurable top_k to limit the number of returned documents.
  • Supports custom query and document prefixes.

Configuration

  1. Drag the NvidiaRanker component onto the canvas from the Component Library.
  2. Click on the component to open the configuration panel.
  3. On the General tab:
    1. Connect Haystack Platform to your NVIDIA account on the Integrations page. For instructions, see Use NVIDIA Models.
    2. Select the ranking model.
    3. Set top_k to control the number of documents returned.
  4. Go to the Advanced tab to configure truncate, query_prefix, document_prefix, meta_fields_to_embed, embedding_separator, timeout, and api_url.

Connections

NvidiaRanker accepts a query string through its query input and a list of documents through its documents input, with an optional top_k override at runtime. It outputs ranked documents through its documents output, sorted from most to least relevant.

Connect a Retriever's (or DocumentJoiner's) documents output to NvidiaRanker's documents input. Then connect its documents output to PromptBuilder or use it as the pipeline's final output.

top_k parameter

In pipelines with both a retriever and a ranker, the top_k values are different. The retriever's top_k specifies how many documents it returns. The ranker then orders these documents.

You can set the same or a smaller top_k value for the ranker. The ranker's top_k is the number of documents it returns (if it's the last component in the pipeline) or forwards to the next component.

Adjusting the top_k values can help you optimize performance. A smaller top_k for the retriever means fewer documents to process for the ranker, which can speed up the pipeline.

Source Code

To check this component's source code, open ranker.py in the Haystack Core Integrations repository.

Usage Examples

Basic Configuration

  NvidiaRanker:
type: haystack_integrations.components.rankers.nvidia.ranker.NvidiaRanker
init_parameters:
api_key:
type: env_var
env_vars:
- NVIDIA_API_KEY
strict: true
model: nvidia/nv-rerankqa-mistral-4b-v3
api_url: https://integrate.api.nvidia.com/v1
top_k: 10
query_prefix: ''
document_prefix: ''
embedding_separator: "\n"

This is an example of a document search pipeline where NvidiaRanker receives joined documents from both a keyword and a semantic retriever and returns the ranked documents as the final result.

components:
bm25_retriever:
type: haystack_integrations.components.retrievers.opensearch.bm25_retriever.OpenSearchBM25Retriever
init_parameters:
document_store:
type: haystack_integrations.document_stores.opensearch.document_store.OpenSearchDocumentStore
init_parameters:
use_ssl: true
verify_certs: false
hosts:
- ${OPENSEARCH_HOST}
http_auth:
- ${OPENSEARCH_USER}
- ${OPENSEARCH_PASSWORD}
embedding_dim: 1024
similarity: cosine
index: ''
max_chunk_bytes: 104857600
return_embedding: false
method:
mappings:
settings:
create_index: true
timeout:
top_k: 20
embedding_retriever:
type: haystack_integrations.components.retrievers.opensearch.embedding_retriever.OpenSearchEmbeddingRetriever
init_parameters:
document_store:
type: haystack_integrations.document_stores.opensearch.document_store.OpenSearchDocumentStore
init_parameters:
use_ssl: true
verify_certs: false
hosts:
- ${OPENSEARCH_HOST}
http_auth:
- ${OPENSEARCH_USER}
- ${OPENSEARCH_PASSWORD}
embedding_dim: 1024
similarity: cosine
index: ''
max_chunk_bytes: 104857600
return_embedding: false
method:
mappings:
settings:
create_index: true
timeout:
top_k: 20
document_joiner:
type: haystack.components.joiners.document_joiner.DocumentJoiner
init_parameters:
join_mode: concatenate
NvidiaTextEmbedder:
type: haystack_integrations.components.embedders.nvidia.text_embedder.NvidiaTextEmbedder
init_parameters:
api_key:
type: env_var
env_vars:
- NVIDIA_API_KEY
strict: true
model: nvidia/nv-embedqa-e5-v5
api_url: https://integrate.api.nvidia.com/v1
prefix: ''
suffix: ''
truncate:
timeout:
NvidiaRanker:
type: haystack_integrations.components.rankers.nvidia.ranker.NvidiaRanker
init_parameters:
api_key:
type: env_var
env_vars:
- NVIDIA_API_KEY
strict: true
model: nvidia/nv-rerankqa-mistral-4b-v3
api_url: https://integrate.api.nvidia.com/v1
top_k: 10
truncate:
query_prefix: ''
document_prefix: ''
meta_fields_to_embed:
embedding_separator: "\n"
timeout:

connections:
- sender: bm25_retriever.documents
receiver: document_joiner.documents
- sender: embedding_retriever.documents
receiver: document_joiner.documents
- sender: NvidiaTextEmbedder.embedding
receiver: embedding_retriever.query_embedding
- sender: document_joiner.documents
receiver: NvidiaRanker.documents

max_runs_per_component: 100

metadata: {}

inputs:
query:
- bm25_retriever.query
- NvidiaTextEmbedder.text
- NvidiaRanker.query
filters:
- bm25_retriever.filters
- embedding_retriever.filters

outputs:
documents: NvidiaRanker.documents

Parameters

Inputs

ParameterTypeDescription
querystrThe query to rank the documents against.
documentsList[Document]The list of documents to rank.
top_kOptional[int]The number of documents to return.

Outputs

ParameterTypeDescription
documentsList[Document]List of documents most similar to the query in descending order of similarity.

Init Parameters

These are the parameters you can configure in Pipeline Builder:

ParameterTypeDefaultDescription
modelOptional[str]NoneRanking model to use.
truncateOptional[Union[RankerTruncateMode, str]]NoneTruncation strategy to use. Can be "NONE", "END", or RankerTruncateMode. Defaults to NIM's default.
api_keyOptional[Secret]Secret.from_env_var('NVIDIA_API_KEY')API key for the NVIDIA NIM.
api_urlstros.getenv('NVIDIA_API_URL', DEFAULT_API_URL)Custom API URL for the NVIDIA NIM.
top_kint5Number of documents to return.
query_prefixstrA string to add at the beginning of the query text before ranking. Use it to prepend the text with an instruction, as required by reranking models like bge.
document_prefixstrA string to add at the beginning of each document before ranking. You can use it to prepend the document with an instruction, as required by embedding models like bge.
meta_fields_to_embedOptional[List[str]]NoneList of metadata fields to embed with the document.
embedding_separatorstr\nSeparator to concatenate metadata fields to the document.
timeoutOptional[float]NoneTimeout for request calls, if not set it is inferred from the NVIDIA_TIMEOUT environment variable or set to 60 by default.

Run Method Parameters

These are the parameters you can configure for the component's run() method. This means you can pass these parameters at query time through the API, in Playground, or when running a job. For details, see Modify Pipeline Parameters at Query Time.

ParameterTypeDefaultDescription
querystrThe query to rank the documents against.
documentsList[Document]The list of documents to rank.
top_kOptional[int]NoneThe number of documents to return.