Designing Your Pipeline

The prospect of creating a user-facing app powered by AI may be intimidating. This guide is meant to help you get started, understand the decisions to be made, and navigate the process you’re about to go through.

Creating an AI system involves several stages, from conceptualizing the use case to fine-tuning the app performance. deepset Cloud provides all the building blocks and infrastructure you need, and a robust API for easy integration to simplify the process.

Define the Use Case

Understanding your use case is the first step in the process. Knowing how your system is going to be used determines your pipeline architecture and your data requirements,

  1. Identify the problem your app will solve or the opportunity it will address. This may be the need to find company resources in an internal database or to have a support system for customers in place.
    If there are any existing solutions, analyze them, focusing on their limitations, and try to understand how an AI-powered approach can be better.
  2. Understand the target audience. Who will use your app? How will they benefit from it? Where will they use it, and how? What are their pain points? What outcomes will they expect to achieve through the app?
  3. Define your app's goals and determine the features and functionalities it should have to achieve them. This includes your app's AI capabilities, such as summarizing or prioritizing certain information or having a human-like conversation.

Some questions to consider are:

  • What type of an app do you need: question answering, document search, a summarization system, a chatbot, a system for finding similar documents, or something else?
  • Do you want the system to generate novel answers based on your data (like RAG systems)?
  • Do you want the system to find exact information in your data and highlight it (like an extractive system)?
  • Do you expect the users to ask proper questions or rather type in keywords?
  • Are there any additional considerations?

Think About Your Data

Once you decide on the system you’re building, it’s time to think about the data it will run on. The quality of your data and how you preprocess it has a great impact on your system’s performance. At this stage:

  • Consider any information that should be included in the data. Do your files contain metadata? Do you want to use these metadata in your app? For example, to prioritize certain documents or as search filters? Have a look at Working with metadata to understand how you can benefit from the metadata in your files.
    Think about whether the app should prioritize certain types of documents, like the most recent ones or ones with specific metadata values?
  • Understand how your data is organized. Is the same information scattered across different files, or does each file contain different information? This can impact how you preprocess your files. If the information is scattered across files, you may want to split your documents into smaller chunks and set the top_k to a higher number.
  • Identify file types. Are all your files of one type, or is it a mixture? Make sure you use appropriate Converters in your indexing pipeline or use a pipeline template. The templates include common converters that should be able to handle your files.
  • Consider the language your data is in. Is it English, German, or any other language? Is it multilingual? Make sure you choose a model that can handle the language of your data or an appropriate pipeline template.

Get Started

Building an AI app is an iterative process. It’s best to start testing as soon as possible and then tweak things based on the results.

  1. Upload your data. You don’t need to think about preprocessing it at this point, but it’s good to make sure your data are clean and usable. In this step, you’re transferring your data to deepset Cloud, where your pipeline can use it.
  2. Create a pipeline. We recommend starting with a ready-made template that best matches the type of system you want to build. The templates use various models. If you’re unsure which one to choose, we recommend starting with gpt-3.5-turbo. You can always change it to another model later.
    For more information about models, see Language models in deepset Cloud.
  3. Deploy your pipeline. Once it’s deployed, you can test it in the Playground.

Now, you have a starting point for your system. It’s time to test it and see how it performs. Ask your colleagues to help you test the pipeline. You can easily share your prototype with anyone without the need to create accounts or log in.

While testing the pipeline, try asking questions your target users would ask and keep notes on your feedback. A couple of things to keep in mind:

  • Check the source documents to verify the pipeline retrieves relevant documents. If it doesn’t, try changing your retrieval approach. For more tips, see Improving your document search pipeline.
  • If you’re using an LLM and the pipeline retrieves correct documents, but the answers are bad, try working on your prompt.
  • Run experiments to get objective metrics on your pipeline performance.

For more ideas on how to improve your pipeline, see Optimizing Your Pipeline.


More in This Section

Upload Files - Upload your data through deepset Cloud, REST API or a Python SDK.

Working with Metadata - Add metadata to your files and learn how to use them in your app.

PreProcessing Data with Pipeline Nodes - Clean and split your files using pipeline components available in deepset Cloud.

Create a Pipeline - Use a template or create a pipeline from an empty file.

Edit a Pipeline - Update a pipeline you created using deepset Cloud interface or REST API.

Change the Pipeline's Service Level - Set your pipeline to production or development service level, depending on where you want to use it.

Deploy a Pipeline - You must deploy a pipeline to use it.

Troubleshoot Pipeline Deployment - Check how to fix problems that may occur when deploying a pipeline.