Tutorial: Building Your First Document Retrieval App

This tutorial teaches you how to build a document retrieval system in the easiest and fastest possible way. It uses the UI for uploading the sample files and a template for creating the document retrieval pipeline.

  • Level: Beginner
  • Time to complete: 10 minutes
  • Prerequisites:
    • This tutorial assumes a basic knowledge of NLP.
    • You must be an Admin to complete this tutorial.
  • Goal: After completing this tutorial, you will have built a complete English document retrieval system from scratch that can fetch NHS documents.

Upload Files

First, let's get the files the search will run on into deepset Cloud.

  1. Download the .zip file from gdrive and unzip it to a location on your computer.
  2. Log in to deepset Cloud and go to Data>Files.
  3. Click Upload Files.
  4. Click Browse and select the files you unpacked in step 1.
    Note: This usually takes a couple of seconds, so don't worry if you can't see anything yet. Just give us a while.
  5. Wait until the files show up on the page. When they do, scroll down to the bottom and click Upload.
  6. Wait until the upload finishes. You should have around 900 files. You can check the number of files on the Dashboard.

Result: Your files have been uploaded and are shown on the Files page.

A screenshots of the Files page with the NHS files uploaded.

Create a Pipeline

The next step is to define the components of your search app. We'll use a document retrieval template with an embedding-based retriever to create the pipeline.

  1. Go to Pipelines>New Pipeline.
  2. Under YAML Editor, click Create Pipeline and select From Template.
A screenshot of the YAML Editor component with the From Template option underlined.
  1. When the templates show up, find the Semantic Document Search template and click Use Template.
  2. When the Pipeline Designer opens, change the pipeline name in line 8 to NHS_doc_retrieval and save the pipeline.
A sceenshot of the YAML editor with the pipeline name updated.
  1. Click Deploy to start indexing and ready your pipeline for running a search.
  2. Return to the Pipelines page and wait until the status of your pipeline changes to Indexed. This can take a couple of minutes.
    Tip: When you hover your mouse over the status, you can see how many files have already been indexed.

Result: You created and deployed a pipeline, which means your documents have been indexed and you can now run a search. Your pipeline shows on the Pipelines page with the status Indexed.

A screenshot of the Pipelines page with the NHS doc retrieval pipeline shown as indexed and deployed

Search

Let's see what the pipeline can do.

  1. Go to Search.
  2. Choose NHS_doc_retrieval as the pipeline.
  3. Type "How do I treat atopic skin?" and search for relevant documents. You should get a number of documents sorted by the most relevant ones.

Result: Congratulations! You have built a search system that can retrieve documents related to health. You can now ask it health-related queries, and it will find relevant documents.