Tutorial: Building Your First Document Retrieval App
This tutorial teaches you how to build a document retrieval system in the easiest and fastest possible way. It uses the UI for uploading the sample files and a template for creating the document retrieval pipeline.
- Level: Beginner
- Time to complete: 10 minutes
- Prerequisites:
- This tutorial assumes a basic knowledge of NLP.
- You must be an Admin to complete this tutorial.
- Make sure you have a deepset Cloud workspace where the information retrieval pipeline will run.
- Goal: After completing this tutorial, you will have built a complete English document retrieval system from scratch that can fetch NHS documents.
Upload Files
First, let's get the files the search will run on into deepset Cloud.
- Download the .zip file from gdrive and unzip it to a location on your computer.
- Log in to deepset Cloud, switch to the right workspace, and go to Data>Files.
- Click Upload Files, drag the files you unpacked in step 1, and drop them to the Upload Files window.
- Click Upload.
- Wait until the upload finishes. You should have around 900 files. You can check the number of files on the Dashboard.
Result: Your files have been uploaded and are shown on the Files page.
Create a Pipeline
The next step is to define the components of your search app. We'll use a document retrieval template with an embedding-based retriever to create the pipeline.
- Go to Pipelines>New Pipeline.
- Under YAML Editor, click Create Pipeline and select From Template.

-
When the templates show up, find the Semantic Document Search template and click Use Template.
-
When the Pipeline Designer opens, change the pipeline name in line 7 to NHS_doc_retrieval and save the pipeline.
# If you need help with the YAML format, have a look at https://docs.cloud.deepset.ai/docs/create-a-pipeline#create-a-pipeline-using-yaml. # This is a friendly editor that helps you create your pipelines with autosuggestions. To use them, press Control + Space on your keyboard. # Whenever you need to specify a model, this editor helps you out as well. Just type your Hugging Face organization and a forward slash (/) to see available models. # This is a document search pipeline that searches for documents based on semantic similarity. It uses a vector-based search. version: '1.21.0' name: 'NHS_doc_retrieval'
-
Click Deploy to start indexing and ready your pipeline for running a search.
-
Return to the Pipelines page and wait until the status of your pipeline changes to Indexed. This can take a couple of minutes.
Tip: When you hover your mouse over the status, you can see how many files have already been indexed.
Result: You created and deployed a pipeline, which means your documents have been indexed, and you can now run a search. Your pipeline shows on the Pipelines page with the status Indexed.
Search
Let's see what the pipeline can do.
- Go to Search.
- Choose NHS_doc_retrieval as the pipeline.
- Type "How do I treat atopic skin?" and search for relevant documents. You should get a number of documents sorted by the most relevant ones.
Result: Congratulations! You have built a search system that can retrieve documents related to health. You can now ask health-related queries, and it will find relevant documents.
Updated about 23 hours ago