TopPSampler
Filter documents using top-p (nucleus) sampling based on cumulative probability scores. It selects documents whose scores fall within the top percentage of the cumulative score distribution, keeping high-scoring documents and discarding less relevant ones.
Key Features
- Filters documents based on cumulative probability threshold (top-p / nucleus sampling).
- The
top_pthreshold can be overridden at query time. - Supports a custom metadata field for scores, or uses the default document score.
- Optionally enforces a minimum number of returned documents via
min_top_k. - A
top_pvalue of 1.0 retains all documents (no filtering).
Configuration
- Drag the
TopPSamplercomponent onto the canvas from the Component Library. - Click the component to open the configuration panel.
- Configure the parameters as needed.
Connections
TopPSampler accepts a list of documents and an optional top_p override as inputs. It outputs documents — a filtered list containing only the documents selected by top-p sampling.
Typically, you place TopPSampler after a retriever or ranker that assigns scores to documents, and before a PromptBuilder or generator to reduce the number of documents passed to the LLM.
Usage Example
components:
TopPSampler:
type: components.samplers.top_p.TopPSampler
init_parameters:
Parameters
Inputs
| Parameter | Type | Default | Description |
|---|---|---|---|
| documents | List[Document] | List of Document objects to be filtered. | |
| top_p | Optional[float] | None | If specified, a float to override the cumulative probability threshold set during initialization. |
Outputs
| Parameter | Type | Default | Description |
|---|---|---|---|
| documents | List[Document] | A dictionary with the following key: - documents: List of Document objects that have been selected based on the top-p sampling. |
Init Parameters
These are the parameters you can configure in Pipeline Builder:
| Parameter | Type | Default | Description |
|---|---|---|---|
| top_p | float | 1 | Float between 0 and 1 representing the cumulative probability threshold for document selection. A value of 1.0 indicates no filtering (all documents are retained). |
| score_field | Optional[str] | None | Name of the field in each document's metadata that contains the score. If None, the default document score field is used. |
| min_top_k | Optional[int] | None | If specified, the minimum number of documents to return. If the top_p selects fewer documents, additional ones with the next highest scores are added to the selection. |
Run Method Parameters
These are the parameters you can configure for the component's run() method. This means you can pass these parameters at query time through the API, in Playground, or when running a job. For details, see Modify Pipeline Parameters at Query Time.
| Parameter | Type | Default | Description |
|---|---|---|---|
| documents | List[Document] | List of Document objects to be filtered. | |
| top_p | Optional[float] | None | If specified, a float to override the cumulative probability threshold set during initialization. |
Was this page helpful?