Tutorial: Building Your First Document Search App

This tutorial teaches you how to build a document search system in the easiest and fastest possible way. It uses the UI for uploading the sample files and a template for creating the document retrieval pipeline and index..

  • Level: Beginner
  • Time to complete: 10 minutes
  • Prerequisites:
    • This tutorial assumes a basic knowledge of language models.
    • You must be an Admin to complete this tutorial.
    • Make sure you have a deepset workspace where the document search pipeline will run. For details, check the instructions in Quick Start Guide.
  • Goal: After completing this tutorial, you will have built a complete English document retrieval system from scratch that can fetch NHS documents.

Upload Files

First, let's get the files the search will run on into deepset AI Platform.

  1. Download the .zip file from gdrive and unzip it to a location on your computer.

  2. Log in to deepset AI Platform, switch to the right workspace, and go to Files.

    The left navigation with step one on the workspace name and step two on the Files option.
  3. Click Upload Files, drag the files you unpacked in step 1, and drop them to the Upload Files window. (You must select all files in a folder; deepset AI Platform doesn't support uploading folders.)

  4. Click Upload and wait until the upload finishes. Even when the upload is finished, the files may take a while to show up in deepset AI Platform. That's expected; just wait a while and refresh the page if needed.

Result: Your files have been uploaded and are shown on the Files page. You should have 953 files.

The Files page with the NHS files uploaded.

Create an Index

Index prepares your files for search by chunking them and storing in a document store, where the query pipeline can access them.

  1. Go to Indexes and click Create Index.

  2. Click the Standard Index (English) template.

  3. Type standard-index as the index name and click Create Index. The index opens in Pipeline Builder.

  4. Click Enable in the top right corner of the Builder.

    The Enable button in pipeline builder

Result: You created and enabled an index you can now connect to your query pipelines to give them access to your files.

Create a Pipeline

The next step is to define the components of your search app. We'll use a document search pipeline template with a vector retriever to create the pipeline.

  1. Go to Pipeline Templates.

  2. Choose Document Search as the category, find Semantic Document Search, and click Use Template.

Screenshot displaying a section of 'Document Search' from a webpage offering pipeline templates for professional use. There are four templates listed, with two detailed descriptions visible: 'Date-Driven Hybrid Document Search' and 'Hybrid Document Search (German)', both described as pipelines that combine keyword-matching and semantic searches to return relevant documents. The sidebar on the left highlights 'Document Search' with a red notification bubble indicating '11'. Each template listing has options to 'View Details' or 'Use Template', and the latter is accompanied by a red notification bubble with the number '2'. The page features a clean interface with a color scheme of gray, red, and blue elements.
  1. Type NHS_doc_search as the pipeline name and click Create Pipeline. You're redirected to Pipeline Builder.

  2. Find the OpenSearchDocumentStore component (it has a warning message prompting you to choose an index) and choose the standard-index index from the list on the component card.

    The document store component card with the index list expanded
  3. Click Save and then Deploy . This prepares your pipeline for search.

    The Deploy button in Studio highlighted
  4. Wait until the status of your pipeline changes to deployed. This can take a couple of minutes.

Result: You created and deployed a pipeline that uses a standard index, and you can now run a search. Your pipeline status is Deployed.
Your pipeline is at the development service level. We recommend you test it before setting it to the production service level.

The Pipelines page with the NHS doc retrieval pipeline shown as indexed and deployed

Try Your Pipeline

Let's see what the pipeline can do.

  1. Go to Playground.
  2. Choose NHS_doc_search as the pipeline.
  3. Type "How do I treat atopic skin?" and search for relevant documents. You should get several documents sorted by the most relevant ones.

Result: Congratulations! You have built a search system that can retrieve documents related to health. You can now ask health-related queries, and it will find relevant documents.

What's Next

Your pipeline is now a development pipeline. Once it's ready for production, change its service level to Production. You can do this on the Pipeline Details page shown after clicking a pipeline name. To learn more, see Pipeline Service Levels.