Tutorial: Building Your First Document Retrieval App
This tutorial teaches you how to build a document retrieval system in the easiest and fastest possible way. It uses the UI for uploading the sample files and a template for creating the document retrieval pipeline.
- Level: Beginner
- Time to complete: 10 minutes
- This tutorial assumes a basic knowledge of NLP.
- You must be an Admin to complete this tutorial.
- Goal: After completing this tutorial, you will have built a complete English document retrieval system from scratch that can fetch NHS documents.
First, let's get the files the search will run on into deepset Cloud.
- Download the .zip file from gdrive and unzip it to a location on your computer.
- Log in to deepset Cloud and go to Data>Files.
- Click Upload Files.
- Click Browse and select the files you unpacked in step 1.
Note: This usually takes a couple of seconds, so don't worry if you can't see anything yet. Just give us a while.
- Wait until the files show up on the page. When they do, scroll down to the bottom and click Upload.
- Wait until the upload finishes. You should have around 900 files. You can check the number of files on the Dashboard.
Result: Your files have been uploaded and are shown on the Files page.
Create a Pipeline
The next step is to define the components of your search app. We'll use a document retrieval template with an embedding-based retriever to create the pipeline.
- Go to Pipelines>New Pipeline.
- Under YAML Editor, click Create Pipeline and select From Template.
- When the templates show up, find the Semantic Document Search template and click Use Template.
- When the Pipeline Designer opens, change the pipeline name in line 8 to NHS_doc_retrieval and save the pipeline.
- Click Deploy to start indexing and ready your pipeline for running a search.
- Return to the Pipelines page and wait until the status of your pipeline changes to Indexed. This can take a couple of minutes.
Tip: When you hover your mouse over the status, you can see how many files have already been indexed.
Result: You created and deployed a pipeline, which means your documents have been indexed and you can now run a search. Your pipeline shows on the Pipelines page with the status Indexed.
Let's see what the pipeline can do.
- Go to Search.
- Choose NHS_doc_retrieval as the pipeline.
- Type "How do I treat atopic skin?" and search for relevant documents. You should get a number of documents sorted by the most relevant ones.
Result: Congratulations! You have built a search system that can retrieve documents related to health. You can now ask it health-related queries, and it will find relevant documents.
Updated 2 months ago