TopPSampler
Implements top-p (nucleus) sampling for document filtering based on cumulative probability scores.
Basic Information
- Type:
haystack_integrations.samplers.top_p.TopPSampler
Inputs
| Parameter | Type | Default | Description |
|---|---|---|---|
| documents | List[Document] | List of Document objects to be filtered. | |
| top_p | Optional[float] | None | If specified, a float to override the cumulative probability threshold set during initialization. |
Outputs
| Parameter | Type | Default | Description |
|---|---|---|---|
| documents | List[Document] | A dictionary with the following key: - documents: List of Document objects that have been selected based on the top-p sampling. |
Overview
Bear with us while we're working on adding pipeline examples and most common components connections.
Implements top-p (nucleus) sampling for document filtering based on cumulative probability scores.
This component provides functionality to filter a list of documents by selecting those whose scores fall within the top 'p' percent of the cumulative distribution. It is useful for focusing on high-probability documents while filtering out less relevant ones based on their assigned scores.
Usage example:
from haystack import Document
from haystack.components.samplers import TopPSampler
sampler = TopPSampler(top_p=0.95, score_field="similarity_score")
docs = [
Document(content="Berlin", meta={"similarity_score": -10.6}),
Document(content="Belgrade", meta={"similarity_score": -8.9}),
Document(content="Sarajevo", meta={"similarity_score": -4.6}),
]
output = sampler.run(documents=docs)
docs = output["documents"]
assert len(docs) == 1
assert docs[0].content == "Sarajevo"
Usage Example
components:
TopPSampler:
type: components.samplers.top_p.TopPSampler
init_parameters:
Parameters
Init Parameters
These are the parameters you can configure in Pipeline Builder:
| Parameter | Type | Default | Description |
|---|---|---|---|
| top_p | float | 1 | Float between 0 and 1 representing the cumulative probability threshold for document selection. A value of 1.0 indicates no filtering (all documents are retained). |
| score_field | Optional[str] | None | Name of the field in each document's metadata that contains the score. If None, the default document score field is used. |
| min_top_k | Optional[int] | None | If specified, the minimum number of documents to return. If the top_p selects fewer documents, additional ones with the next highest scores are added to the selection. |
Run Method Parameters
These are the parameters you can configure for the component's run() method. This means you can pass these parameters at query time through the API, in Playground, or when running a job. For details, see Modify Pipeline Parameters at Query Time.
| Parameter | Type | Default | Description |
|---|---|---|---|
| documents | List[Document] | List of Document objects to be filtered. | |
| top_p | Optional[float] | None | If specified, a float to override the cumulative probability threshold set during initialization. |
Was this page helpful?