Tutorial: Building Your First Document Search App
This tutorial teaches you how to build a document search system in the easiest and fastest possible way. It uses the UI for uploading the sample files and a template for creating the document retrieval pipeline.
- Level: Beginner
- Time to complete: 10 minutes
- Prerequisites:
- This tutorial assumes a basic knowledge of language models.
- You must be an Admin to complete this tutorial.
- Make sure you have a deepset workspace where the information retrieval pipeline will run.
- Goal: After completing this tutorial, you will have built a complete English document retrieval system from scratch that can fetch NHS documents.
Upload Files
First, let's get the files the search will run on into deepset AI Platform.
-
Download the .zip file from gdrive and unzip it to a location on your computer.
-
Log in to deepset AI Platform, switch to the right workspace, and go to Files.
-
Click Upload Files, drag the files you unpacked in step 1, and drop them to the Upload Files window. (You must select all files in a folder; deepset AI Platform doesn't support uploading folders.)
-
Click Upload and wait until the upload finishes. Even when the upload is finished, the files may take a while to show up in deepset AI Platform. That's expected; just wait a while and refresh the page if needed.
Result: Your files have been uploaded and are shown on the Files page. You should have 953 files.
data:image/s3,"s3://crabby-images/fb8ad/fb8ad3172150ec10683b72987af12769eb68122c" alt="The Files page with the NHS files uploaded."
Create a Pipeline
The next step is to define the components of your search app. We'll use a document search pipeline template with a vector retriever to create the pipeline.
-
Go to Pipeline Templates.
-
Choose Document Search as the category, find Semantic Document Search, and click Use Template.
data:image/s3,"s3://crabby-images/8df23/8df23e86ae3a7abbbd291d906896b874dcc21ed4" alt="Screenshot displaying a section of 'Document Search' from a webpage offering pipeline templates for professional use. There are four templates listed, with two detailed descriptions visible: 'Date-Driven Hybrid Document Search' and 'Hybrid Document Search (German)', both described as pipelines that combine keyword-matching and semantic searches to return relevant documents. The sidebar on the left highlights 'Document Search' with a red notification bubble indicating '11'. Each template listing has options to 'View Details' or 'Use Template', and the latter is accompanied by a red notification bubble with the number '2'. The page features a clean interface with a color scheme of gray, red, and blue elements."
-
Type NHS_doc_search as the pipeline name and click Create Pipeline. You're redirected to Pipeline Builder.
-
Click Deploy . This triggers indexing and prepares your pipeline for search.
-
Wait until the status of your pipeline changes to Indexed. This can take a couple of minutes.
Tip: When you hover your mouse over the status, you can see the number of files already indexed.
Result: You created and deployed a pipeline, which means your documents have been indexed, and you can now run a search. Your pipeline status is Indexed.
Your pipeline is at the development service level. We recommend you test it before setting it to the production service level.
data:image/s3,"s3://crabby-images/e222a/e222a6d096f54f946d16f5b1b13e987eaf44d7e9" alt="The Pipelines page with the NHS doc retrieval pipeline shown as indexed and deployed"
Try Your Pipeline
Let's see what the pipeline can do.
- Go to Playground.
- Choose NHS_doc_search as the pipeline.
- Type "How do I treat atopic skin?" and search for relevant documents. You should get several documents sorted by the most relevant ones.
Result: Congratulations! You have built a search system that can retrieve documents related to health. You can now ask health-related queries, and it will find relevant documents.
What's Next
Your pipeline is now a development pipeline. Once it's ready for production, change its service level to Production. You can do this on the Pipeline Details page shown after clicking a pipeline name. To learn more, see Pipeline Service Levels.
Updated 14 days ago