Designing Your Pipeline
The prospect of creating a user-facing app powered by AI may be intimidating. This guide is meant to help you get started, understand the decisions to be made, and navigate the process you’re about to go through.
Creating an AI system involves several stages, from conceptualizing the use case to fine-tuning the app performance. deepset Cloud provides all the building blocks and infrastructure you need, and a robust API for easy integration to simplify the process.
Define the Use Case
Understanding your use case is the first step in the process. Knowing how your system is going to be used determines your pipeline architecture and your data requirements.
- Identify the problem your app will solve or the opportunity it will address. This may be the need to find company resources in an internal database or to have a support system for customers in place.
If there are any existing solutions, analyze them, focusing on their limitations, and try to understand how an AI-powered approach can be better. - Understand the target audience. Who will use your app? How will they benefit from it? Where will they use it, and how? What are their pain points? What outcomes will they expect to achieve through the app?
- Define your app's goals and determine the features and functionalities it should have to achieve them. This includes your app's AI capabilities, such as summarizing or prioritizing certain information or having a human-like conversation.
Some questions to consider are:
- What type of an app do you need: question answering, document search, a summarization system, a chatbot, a system for finding similar documents, or something else?
- Do you want the system to generate novel answers based on your data (like RAG systems)?
- Do you want the system to find exact information in your data and highlight it (like an extractive system)?
- Do you expect the users to ask proper questions or rather type in keywords?
- Are there any additional considerations?
Think About Your Data
Once you decide on the system you’re building, it’s time to think about the data it will run on. The quality of your data and how you preprocess it has a great impact on your system’s performance. At this stage:
- Consider any information that should be included in the data. Do your files contain metadata? Do you want to use these metadata in your app? For example, to prioritize certain documents or as search filters? Have a look at Working with Metadata to understand how you can benefit from the metadata in your files.
Think about whether the app should prioritize certain types of documents, like the most recent ones or ones with specific metadata values? - Understand how your data is organized. Is the same information scattered across different files, or does each file contain different information? This can impact how you preprocess your files. If the information is scattered across files, you may want to split your documents into smaller chunks and set the
top_k
to a higher number. - Identify file types. Are all your files of one type, or is it a mixture? Make sure you use appropriate Converters in your indexing pipeline or use a pipeline template. The templates include common converters that should be able to handle your files.
- Consider the language your data is in. Is it English, German, or any other language? Is it multilingual? Make sure you choose a model that can handle the language of your data or an appropriate pipeline template.
Get Started
Building an AI app is an iterative process. It’s best to start testing as soon as possible and then tweak things based on the results.
- Upload your data. You don’t need to think about preprocessing them at this point, but it’s good to make sure your data are clean and usable. In this step, you’re transferring your data to deepset Cloud, where your pipeline can use them.
- Create a Pipeline. We recommend starting with a ready-made template that best matches the type of system you want to build. The templates use various models. If you’re unsure which one to choose, we recommend starting with gpt-4-turbo. You can always change it to another model later without any trouble.
For more information about models, see Language Models in deepset Cloud. - Deploy your pipeline. Once it’s deployed, you can test it in the Playground.
Now, you have a starting point for your system. It’s time to test it and see how it performs. Ask your colleagues to help you test the pipeline. You can easily share your prototype with anyone without the need to create accounts or log in. Prototypes are customizable; you can add your logo and brand colors.
While testing the pipeline, try asking questions your target users would ask and keep notes on your feedback. A couple of things to keep in mind:
- Check the source documents to verify the pipeline retrieves relevant documents. If it doesn’t, try changing your retrieval approach. For more tips, see Improving Your Document Search Pipeline.
- If you’re using an LLM and the pipeline retrieves correct documents, but the answers are bad, try changing your prompt.
For more ideas on improving your pipeline, see Optimizing Your Pipeline.
More in This Section
- Create a Pipeline - Use a template or create a pipeline from an empty file.
- Create a Custom Component - Create your own components and use them in your pipelines.
- Using Hosted Models and External Services - Use deepset Cloud integrations, like Snowflake or DeepL, in your pipelines.
- Edit a Pipeline - Update a pipeline you created using deepset Cloud interface or REST API.
- Building with Large Language Models (LLMs) - Learn about additional considerations when building systems that use LLMs.
- Change the Pipeline's Service Level - Set your pipeline to production or development service level, depending on where you want to use it.
- Deploy a Pipeline - You must deploy a pipeline to use it.
- Troubleshoot Pipeline Deployment - Check how to fix problems that may occur when deploying a pipeline.
Updated about 2 months ago