TopPSampler

Implements top-p (nucleus) sampling for document filtering based on cumulative probability scores.

Basic Information

Type: haystack_integrations.samplers.top_p.TopPSampler

Inputs

Parameter	Type	Default	Description
documents	List[Document]		List of Document objects to be filtered.
top_p	Optional[float]	None	If specified, a float to override the cumulative probability threshold set during initialization.

Outputs

Parameter	Type	Default	Description
documents	List[Document]		A dictionary with the following key: - `documents`: List of Document objects that have been selected based on the top-p sampling.

Overview

Work in Progress

Bear with us while we're working on adding pipeline examples and most common components connections.

Implements top-p (nucleus) sampling for document filtering based on cumulative probability scores.

This component provides functionality to filter a list of documents by selecting those whose scores fall within the top 'p' percent of the cumulative distribution. It is useful for focusing on high-probability documents while filtering out less relevant ones based on their assigned scores.

Usage example:

from haystack import Document
from haystack.components.samplers import TopPSampler

sampler = TopPSampler(top_p=0.95, score_field="similarity_score")
docs = [
    Document(content="Berlin", meta={"similarity_score": -10.6}),
    Document(content="Belgrade", meta={"similarity_score": -8.9}),
    Document(content="Sarajevo", meta={"similarity_score": -4.6}),
]
output = sampler.run(documents=docs)
docs = output["documents"]
assert len(docs) == 1
assert docs[0].content == "Sarajevo"

Usage Example

components:
  TopPSampler:
    type: components.samplers.top_p.TopPSampler
    init_parameters:

Parameters

Init Parameters

These are the parameters you can configure in Pipeline Builder:

Parameter	Type	Default	Description
top_p	float	1	Float between 0 and 1 representing the cumulative probability threshold for document selection. A value of 1.0 indicates no filtering (all documents are retained).
score_field	Optional[str]	None	Name of the field in each document's metadata that contains the score. If None, the default document score field is used.
min_top_k	Optional[int]	None	If specified, the minimum number of documents to return. If the top_p selects fewer documents, additional ones with the next highest scores are added to the selection.

Run Method Parameters

These are the parameters you can configure for the component's run() method. This means you can pass these parameters at query time through the API, in Playground, or when running a job. For details, see Modify Pipeline Parameters at Query Time.

Parameter	Type	Default	Description
documents	List[Document]		List of Document objects to be filtered.
top_p	Optional[float]	None	If specified, a float to override the cumulative probability threshold set during initialization.

Was this page helpful?

Basic Information​

Inputs​

Outputs​

Overview​

Usage Example​

Parameters​

Init Parameters​

Run Method Parameters​