Skip to main content
For the complete documentation index for agents and LLMs, see llms.txt.

TopPSampler

Filter documents using top-p (nucleus) sampling based on cumulative probability scores. The component selects documents whose scores fall within the top p percent of the cumulative distribution, focusing on high-probability documents while filtering out less relevant ones.

Key Features

  • Filters documents based on cumulative probability thresholds (nucleus sampling).
  • Configurable probability threshold to control how many documents are retained.
  • Optional minimum document count to ensure a minimum number of results.
  • Supports custom score metadata fields.

Configuration

  1. Drag the TopPSampler component onto the canvas from the Component Library.
  2. Click on the component to open the configuration panel.
  3. On the General tab:
    • Set top_p to the cumulative probability threshold (0 to 1). A value of 1.0 retains all documents.
    • Optionally set score_field to specify which metadata field contains document scores.
    • Optionally set min_top_k to ensure a minimum number of documents are returned.

Connections

TopPSampler accepts a list of documents through its documents input, typically from a retriever or ranker. It outputs a filtered documents list. Connect the output to an LLM or prompt builder for downstream processing.

Source Code

To check this component's source code, open top_p.py in the Haystack repository.

Usage Examples

Basic Configuration

  TopPSampler:
type: components.samplers.top_p.TopPSampler
init_parameters: {}

Typically, you place TopPSampler after a retriever or ranker that assigns scores to documents, and before a PromptBuilder or generator to reduce the number of documents passed to the LLM.

components:
TopPSampler:
type: components.samplers.top_p.TopPSampler
init_parameters:

Parameters

Inputs

ParameterTypeDefaultDescription
documentsList[Document]List of documents to filter.
top_pOptional[float]NoneIf specified, overrides the cumulative probability threshold set during initialization.

Outputs

ParameterTypeDefaultDescription
documentsList[Document]List of documents selected based on the top-p sampling.

Init Parameters

These are the parameters you can configure in Pipeline Builder:

ParameterTypeDefaultDescription
top_pfloat1Float between 0 and 1 representing the cumulative probability threshold for document selection. A value of 1.0 means no filtering (all documents are retained).
score_fieldOptional[str]NoneName of the field in each document's metadata that contains the score. If None, the default document score field is used.
min_top_kOptional[int]NoneIf specified, the minimum number of documents to return. If the top-p sampling selects fewer documents, additional ones with the next highest scores are added to the selection.

Run Method Parameters

These are the parameters you can configure for the component's run() method. This means you can pass these parameters at query time through the API, in Playground, or when running a job. For details, see Modify Pipeline Parameters at Query Time.

ParameterTypeDefaultDescription
documentsList[Document]List of documents to filter.
top_pOptional[float]NoneIf specified, overrides the cumulative probability threshold set during initialization.