Skip to main content

Clean & Split Group

Components in this group help you prepare raw documents for retrieval. Use them to remove boilerplate and noise, normalize content, and split large documents into smaller chunks that are easier to embed, index, and retrieve.

This subgroup is especially useful when your inputs vary in format or quality, or when you want more control over chunk size and boundaries (for example, splitting by structure, recursively, or by CSV rows). Well-chosen cleaning and splitting improves retrieval quality and reduces wasted context in downstream steps.

Available Components