Skip to main content

Data Processing Components

Data Processing components help you prepare data before you store it, retrieve it, or pass it to AI components. Use this group to turn raw inputs into clean, well-structured documents, enrich them with metadata, and shape content so it works well with downstream steps such as embedding, retrieval, ranking, and generation.

The group is split into four subgroups that cover the most common stages of preprocessing:

  • Clean & split: Remove noise and split large documents into smaller chunks that fit your retrieval strategy.
  • Convert: Turn files or links into content you can process in a pipeline (for example, fetch content from a URL or wrap file contents).
  • Extract: Pull structured information from text, such as metadata, entities, language, or targeted fields.
  • Transform: Validate, reshape, or build outputs from pipeline results (for example, validate JSON outputs or build answers).

Subgroups

Use these pages to explore the components in each subgroup: