TextCleaner

Cleans text strings.

Basic Information

Type: haystack_integrations.preprocessors.text_cleaner.TextCleaner

Inputs

Parameter	Type	Default	Description
texts	List[str]		List of strings to clean.

Outputs

Parameter	Type	Default	Description
texts	List[str]		A dictionary with the following key: - `texts`: the cleaned list of strings.

Overview

Work in Progress

Bear with us while we're working on adding pipeline examples and most common components connections.

Cleans text strings.

It can remove substrings matching a list of regular expressions, convert text to lowercase, remove punctuation, and remove numbers. Use it to clean up text data before evaluation.

Usage Example

components:
  TextCleaner:
    type: components.preprocessors.text_cleaner.TextCleaner
    init_parameters:

Parameters

Init Parameters

These are the parameters you can configure in Pipeline Builder:

Parameter	Type	Default	Description
remove_regexps	Optional[List[str]]	None	A list of regex patterns to remove matching substrings from the text.
convert_to_lowercase	bool	False	If `True`, converts all characters to lowercase.
remove_punctuation	bool	False	If `True`, removes punctuation from the text.
remove_numbers	bool	False	If `True`, removes numerical digits from the text.

Run Method Parameters

These are the parameters you can configure for the component's run() method. This means you can pass these parameters at query time through the API, in Playground, or when running a job. For details, see Modify Pipeline Parameters at Query Time.

Parameter	Type	Default	Description
texts	List[str]		List of strings to clean.

Was this page helpful?

Basic Information​

Inputs​

Outputs​

Overview​

Usage Example​

Parameters​

Init Parameters​

Run Method Parameters​