Customize DocumentCleaner to preprocess your text documents.
YAML Init Parameters
These are the parameters you can pass to this component in the pipeline YAML configuration:
Parameter | Type | Possible values | Description |
---|---|---|---|
| Boolean |
| Removes empty lines. |
| Boolean |
| Removes extra whitespaces. |
| Boolean |
| Removes repeated substrings (headers and footers) from pages. Pages in the text must be separated by form feed character |
| Boolean |
| Keep the IDs of the original documents. |
| List of strings | Default: | List of substrings to remove from the text. |
| String | Default: | Regex to match and replace substrings by "". |
REST API Runtime Parameters
There are no runtime parameters you can pass to this component when making a request to the Search REST API endpoint.