Tutorial: Building Your First Document Search App
This tutorial teaches you how to build a document search system in the easiest and fastest possible way. It uses the UI for uploading the sample files and a template for creating the document retrieval pipeline.
- Level: Beginner
- Time to complete: 10 minutes
- Prerequisites:
- This tutorial assumes a basic knowledge of language models.
- You must be an Admin to complete this tutorial.
- Make sure you have a deepset Cloud workspace where the information retrieval pipeline will run.
- Goal: After completing this tutorial, you will have built a complete English document retrieval system from scratch that can fetch NHS documents.
Upload Files
First, let's get the files the search will run on into deepset Cloud.
-
Download the .zip file from gdrive and unzip it to a location on your computer.
-
Log in to deepset Cloud, switch to the right workspace, and go to Files.
-
Click Upload Files, drag the files you unpacked in step 1, and drop them to the Upload Files window.
-
Click Upload and wait until the upload finishes. Even when the upload is finished, the files may take a while to show up in deepset Cloud. That's expected; just wait a while and refresh the page.
Result: Your files have been uploaded and are shown on the Files page. You should have 953 files.
![The Files page with the NHS files uploaded.](https://files.readme.io/cd7a1b1-files_page.png)
Create a Pipeline
The next step is to define the components of your search app. We'll use a document search pipeline template with an embedding-based Retriever to create the pipeline.
-
Go to Pipeline Templates and choose Document Search as the category.
-
Find Semantic Document Search, and click Use Template.
![Screenshot displaying a section of 'Document Search' from a webpage offering pipeline templates for professional use. There are four templates listed, with two detailed descriptions visible: 'Date-Driven Hybrid Document Search' and 'Hybrid Document Search (German)', both described as pipelines that combine keyword-matching and semantic searches to return relevant documents. The sidebar on the left highlights 'Document Search' with a red notification bubble indicating '11'. Each template listing has options to 'View Details' or 'Use Template', and the latter is accompanied by a red notification bubble with the number '2'. The page features a clean interface with a color scheme of gray, red, and blue elements.](https://files.readme.io/8607774-semantic_doc_search_template.png)
- Type NHS_doc_search as the pipeline name and click Create Pipeline. You're redirected to the Pipelines page. You can find your pipeline in the All tab.
Info: Newly created undeployed pipelines are automatically classified as drafts, so you can also find your pipeline in the _Drafts tab. But once you deploy it, it changes to a Development pipeline and is moved from the _Drafts to the Development tab.
- Click Deploy next to your pipeline. This triggers indexing and makes your pipeline ready to run a search.
- Wait until the status of your pipeline changes to Indexed. This can take a couple of minutes.
Tip: When you hover your mouse over the status, you can see how many files have already been indexed.
Result: You created and deployed a pipeline, which means your documents have been indexed, and you can now run a search. Your pipeline status is Indexed.
Your pipeline is at the development service level. We recommend you test it before setting it to the production service level.
![The Pipelines page with the NHS doc retrieval pipeline shown as indexed and deployed](https://files.readme.io/d6a699f-indexed.png)
Try Your Pipeline
Let's see what the pipeline can do.
- Go to Playground.
- Choose NHS_doc_search as the pipeline.
- Type "How do I treat atopic skin?" and search for relevant documents. You should get a number of documents sorted by the most relevant ones.
Result: Congratulations! You have built a search system that can retrieve documents related to health. You can now ask health-related queries, and it will find relevant documents.
What's Next
Your pipeline is now a development pipeline. Once it's ready for production, change its service level to Production. You can do this on the Pipeline Details page shown after clicking a pipeline name. To learn more, see Pipeline Service Levels.
Updated 11 days ago