TextCleaner
Cleans text strings.
Basic Information
- Type:
haystack_integrations.preprocessors.text_cleaner.TextCleaner
Inputs
| Parameter | Type | Default | Description |
|---|---|---|---|
| texts | List[str] | List of strings to clean. |
Outputs
| Parameter | Type | Default | Description |
|---|---|---|---|
| texts | List[str] | A dictionary with the following key: - texts: the cleaned list of strings. |
Overview
Work in Progress
Bear with us while we're working on adding pipeline examples and most common components connections.
Cleans text strings.
It can remove substrings matching a list of regular expressions, convert text to lowercase, remove punctuation, and remove numbers. Use it to clean up text data before evaluation.
Usage Example
components:
TextCleaner:
type: components.preprocessors.text_cleaner.TextCleaner
init_parameters:
Parameters
Init Parameters
These are the parameters you can configure in Pipeline Builder:
| Parameter | Type | Default | Description |
|---|---|---|---|
| remove_regexps | Optional[List[str]] | None | A list of regex patterns to remove matching substrings from the text. |
| convert_to_lowercase | bool | False | If True, converts all characters to lowercase. |
| remove_punctuation | bool | False | If True, removes punctuation from the text. |
| remove_numbers | bool | False | If True, removes numerical digits from the text. |
Run Method Parameters
These are the parameters you can configure for the component's run() method. This means you can pass these parameters at query time through the API, in Playground, or when running a job. For details, see Modify Pipeline Parameters at Query Time.
| Parameter | Type | Default | Description |
|---|---|---|---|
| texts | List[str] | List of strings to clean. |
Was this page helpful?