Skip to main content

TextCleaner

Cleans text strings.

Basic Information

  • Type: haystack_integrations.preprocessors.text_cleaner.TextCleaner

Inputs

ParameterTypeDefaultDescription
textsList[str]List of strings to clean.

Outputs

ParameterTypeDefaultDescription
textsList[str]A dictionary with the following key: - texts: the cleaned list of strings.

Overview

Work in Progress

Bear with us while we're working on adding pipeline examples and most common components connections.

Cleans text strings.

It can remove substrings matching a list of regular expressions, convert text to lowercase, remove punctuation, and remove numbers. Use it to clean up text data before evaluation.

Usage Example

components:
TextCleaner:
type: components.preprocessors.text_cleaner.TextCleaner
init_parameters:

Parameters

Init Parameters

These are the parameters you can configure in Pipeline Builder:

ParameterTypeDefaultDescription
remove_regexpsOptional[List[str]]NoneA list of regex patterns to remove matching substrings from the text.
convert_to_lowercaseboolFalseIf True, converts all characters to lowercase.
remove_punctuationboolFalseIf True, removes punctuation from the text.
remove_numbersboolFalseIf True, removes numerical digits from the text.

Run Method Parameters

These are the parameters you can configure for the component's run() method. This means you can pass these parameters at query time through the API, in Playground, or when running a job. For details, see Modify Pipeline Parameters at Query Time.

ParameterTypeDefaultDescription
textsList[str]List of strings to clean.