TopPSampler
Filter documents using top-p (nucleus) sampling based on cumulative probability scores. The component selects documents whose scores fall within the top p percent of the cumulative distribution, focusing on high-probability documents while filtering out less relevant ones.
Key Features
- Filters documents based on cumulative probability thresholds (nucleus sampling).
- Configurable probability threshold to control how many documents are retained.
- Optional minimum document count to ensure a minimum number of results.
- Supports custom score metadata fields.
Configuration
- Drag the
TopPSamplercomponent onto the canvas from the Component Library. - Click on the component to open the configuration panel.
- On the General tab:
- Set
top_pto the cumulative probability threshold (0 to 1). A value of 1.0 retains all documents. - Optionally set
score_fieldto specify which metadata field contains document scores. - Optionally set
min_top_kto ensure a minimum number of documents are returned.
- Set
Connections
TopPSampler accepts a list of documents through its documents input, typically from a retriever or ranker. It outputs a filtered documents list. Connect the output to an LLM or prompt builder for downstream processing.
Source Code
To check this component's source code, open top_p.py in the Haystack repository.
Usage Examples
Basic Configuration
TopPSampler:
type: components.samplers.top_p.TopPSampler
init_parameters: {}
Typically, you place TopPSampler after a retriever or ranker that assigns scores to documents, and before a PromptBuilder or generator to reduce the number of documents passed to the LLM.
components:
TopPSampler:
type: components.samplers.top_p.TopPSampler
init_parameters:
Parameters
Inputs
| Parameter | Type | Default | Description |
|---|---|---|---|
documents | List[Document] | List of documents to filter. | |
top_p | Optional[float] | None | If specified, overrides the cumulative probability threshold set during initialization. |
Outputs
| Parameter | Type | Default | Description |
|---|---|---|---|
documents | List[Document] | List of documents selected based on the top-p sampling. |
Init Parameters
These are the parameters you can configure in Pipeline Builder:
| Parameter | Type | Default | Description |
|---|---|---|---|
top_p | float | 1 | Float between 0 and 1 representing the cumulative probability threshold for document selection. A value of 1.0 means no filtering (all documents are retained). |
score_field | Optional[str] | None | Name of the field in each document's metadata that contains the score. If None, the default document score field is used. |
min_top_k | Optional[int] | None | If specified, the minimum number of documents to return. If the top-p sampling selects fewer documents, additional ones with the next highest scores are added to the selection. |
Run Method Parameters
These are the parameters you can configure for the component's run() method. This means you can pass these parameters at query time through the API, in Playground, or when running a job. For details, see Modify Pipeline Parameters at Query Time.
| Parameter | Type | Default | Description |
|---|---|---|---|
documents | List[Document] | List of documents to filter. | |
top_p | Optional[float] | None | If specified, overrides the cumulative probability threshold set during initialization. |
Was this page helpful?