Skip to main content
For the complete documentation index for agents and LLMs, see llms.txt.

TopPSampler

Filter documents using top-p (nucleus) sampling based on cumulative probability scores. It selects documents whose scores fall within the top percentage of the cumulative score distribution, keeping high-scoring documents and discarding less relevant ones.

Key Features

  • Filters documents based on cumulative probability threshold (top-p / nucleus sampling).
  • The top_p threshold can be overridden at query time.
  • Supports a custom metadata field for scores, or uses the default document score.
  • Optionally enforces a minimum number of returned documents via min_top_k.
  • A top_p value of 1.0 retains all documents (no filtering).

Configuration

  1. Drag the TopPSampler component onto the canvas from the Component Library.
  2. Click the component to open the configuration panel.
  3. Configure the parameters as needed.

Connections

TopPSampler accepts a list of documents and an optional top_p override as inputs. It outputs documents — a filtered list containing only the documents selected by top-p sampling.

Typically, you place TopPSampler after a retriever or ranker that assigns scores to documents, and before a PromptBuilder or generator to reduce the number of documents passed to the LLM.

Usage Example

components:
TopPSampler:
type: components.samplers.top_p.TopPSampler
init_parameters:

Parameters

Inputs

ParameterTypeDefaultDescription
documentsList[Document]List of Document objects to be filtered.
top_pOptional[float]NoneIf specified, a float to override the cumulative probability threshold set during initialization.

Outputs

ParameterTypeDefaultDescription
documentsList[Document]A dictionary with the following key: - documents: List of Document objects that have been selected based on the top-p sampling.

Init Parameters

These are the parameters you can configure in Pipeline Builder:

ParameterTypeDefaultDescription
top_pfloat1Float between 0 and 1 representing the cumulative probability threshold for document selection. A value of 1.0 indicates no filtering (all documents are retained).
score_fieldOptional[str]NoneName of the field in each document's metadata that contains the score. If None, the default document score field is used.
min_top_kOptional[int]NoneIf specified, the minimum number of documents to return. If the top_p selects fewer documents, additional ones with the next highest scores are added to the selection.

Run Method Parameters

These are the parameters you can configure for the component's run() method. This means you can pass these parameters at query time through the API, in Playground, or when running a job. For details, see Modify Pipeline Parameters at Query Time.

ParameterTypeDefaultDescription
documentsList[Document]List of Document objects to be filtered.
top_pOptional[float]NoneIf specified, a float to override the cumulative probability threshold set during initialization.