Skip to main content

Tutorial: Building a Data Cleaning Agent

Build a data cleaning agent that automatically fixes messy CSV files using your custom functions. You'll learn how to add custom code as a tool the agent can use. The agent follows a set of rules to decide which function to run for each cleanup task. It doesn't rely on any external data—just upload your CSV file, and the agent will handle the rest.


  • Level: Beginner
  • Time to complete: 10 minutes
  • Prerequisites:
    • A basic knowledge of pipelines and indexes in Haystack Platform. For more information, see Pipelines and Indexes.
    • A workspace where you'll create the pipeline. For help, see Quick Start Guide.
  • Goal: After completing this tutorial, you will have built a data cleaning agent that can clean data so that it matches quality standards. This agent will use custom functions to clean the data, and decide which function to use based on the data quality.

Create a Basic Agent

Let's create a simple agent that we'll later modify by adding custom code as its tools and a system prompt with instructions.

  1. In Haystack Enterprise Platform, make sure you're in the correct workspace, go to Pipelines>Create Pipeline.
  2. On the Templates page, click Create empty pipeline.
  3. Type data-cleaning-agent as the pipeline name and click Create Pipeline.
  4. In Builder, drag the Input component onto the canvas.