PreProcessors
These components are used in indexing pipelines to prepare your data for search by normalizing whitespaces, cleaning empty lines, or splitting documents into smaller chunks.
- DocumentCleaner: Makes document text more readable by removing extra whitespaces, empty lines, and the like.
- DocumentSplitter: Splits documents into shorter chunks.
- TextCleaner: Removes regexes, punctuation, and numbers from text.
Updated 1 day ago