Quick Start Guide

Use one of our templates to get a head start and immediately understand the benefits of deepset Cloud.

📘

You must be an Admin to perform this task.

In a nutshell, here are the steps you must take to create a search app:

A diagram showing four steps needed to create a search app. The steps are: 1. upload your files, 2. create a pipeline, 3. evaluate your pipeline, 4. share it with othersA diagram showing four steps needed to create a search app. The steps are: 1. upload your files, 2. create a pipeline, 3. evaluate your pipeline, 4. share it with others

Upload Your Files

First, add the files on which you want to run your search:

  1. Log in to deepset Cloud.
  2. On the Dashboard, go to Data> Files > Upload Files.
  3. Drag the files from your computer and drop them on the Upload Your Files page. deepset Cloud accepts PDF and txt files.
  4. Click Upload. Your files are now listed on the Files page.

Create a Pipeline

Now, add the pipeline that we created for you:

  1. In deepset Cloud, go to Pipelines > Create Pipeline.
  2. Choose YAML Editor.
  3. Copy a pipeline from the Sample Pipelines section of this document and paste it into the deepset Cloud Pipeline Designer. Don’t forget to save it.
  4. Click Deploy to be able to select this pipeline for your search.

Sample Pipelines

Here are a couple of pipelines ready for you to use in your search system. You can take them as they are or you can adjust them for your use case:

A sample document retrieval pipeline with a dense retriever
This is a pipeline that returns documents as answers. It uses a dense, embedding-based retriever.
# If you need help with the YAML format, have a look at https://docs.cloud.deepset.ai/docs/create-a-pipeline-using-a-yaml-file.
# This is a friendly editor that helps you create your pipelines with autosuggestions. To use them, press control + space on your keyboard.
# Whenever you need to specify a model, this editor also helps you. Just type your Hugging Face organization and a forward slash (/) to see available models.

# This is a default document search pipeline with a good embedding-based Retriever
version: '1.8.0'
name: 'DenseDocSearch'

# This section defines the nodes that you want to use in your pipelines. Each node must have a name and a type. You can also set the node's parameters here.
# The name is up to you, you can give your component a familiar name. You then use components' names when specifying their order in the pipeline.
# Type is the class name of the component. 
components:
  - name: DocumentStore
    type: DeepsetCloudDocumentStore #the only supported document store in deepset Cloud
  - name: Retriever #selects the most relevant documents from the document store
    type: EmbeddingRetriever #uses one Transformer model to encode the document and the query
    params:
      document_store: DocumentStore
      embedding_model: sentence-transformers/multi-qa-mpnet-base-dot-v1 #model optimized for semantic search
      model_format: sentence_transformers
      top_k: 20 #the number of results to return
  - name: FileTypeClassifier #routes files based on their extension to appropriate converters
    type: FileTypeClassifier
  - name: TextConverter #converts .txt files into documents
    type: TextConverter
  - name: PDFConverter #converts PDFs into documents
    type: PDFToTextConverter
  - name: Preprocessor #splits documents into smaller ones and cleans them up
    type: PreProcessor
    params:
      #With a dense retriever, it's good to split your documents into smaller ones
      split_by: word #the unit by which you want to split the documents
      split_length: 250 #the max number of words in a document
      split_overlap: 30 #enables the sliding window approach
      split_respect_sentence_boundary: True #retains complete sentences in split documents

# Here you define how the nodes are organized in the pipelines
# For each node, specify its input
pipelines:
  - name: query
    nodes:
      - name: Retriever
        inputs: [Query]
  - name: indexing
    nodes:
      # Depending on the file type we use a Text or PDF converter
      - name: FileTypeClassifier
        inputs: [File]
      - name: TextConverter
        inputs: [FileTypeClassifier.output_1] #ensures that this converter receives txt files
      - name: PDFConverter
        inputs: [FileTypeClassifier.output_2] #ensures that this converter receives PDFs
      - name: Preprocessor
        inputs: [TextConverter, PDFConverter]
      - name: Retriever
        inputs: [Preprocessor]
      - name: DocumentStore
        inputs: [Retriever]
A sample document retrieval pipeline with a sparse retriever
This pipeline is a good starting point for a document search pipeline. It returns documents as answers. The retriever here is a keyword-based retriever that uses the Elasticsearch BM25 algorithm.
# If you need help with the YAML format, have a look at https://docs.cloud.deepset.ai/docs/create-a-pipeline-using-a-yaml-file.
# This is a friendly editor that helps you create your pipelines with autosuggestions. To use them, press control + space on your keyboard.
# Whenever you need to specify a model, this editor also helps you. Just type your Hugging Face organization and a forward slash (/) to see available models.


# A baseline pipeline for document search that uses a traditional, sparse retriever (using Elasticsearch's BM25 algorithm).
# It relies on matching keywords between query and document and is often a solid baseline to start with
version: '1.8.0'
name: 'SparseDocSearch_BM25'

# This section defines the nodes that you want to use in your pipelines. Each node must have a name and a type. You can also set the node's parameters here.
# The name is up to you, you can give your component a familiar name. You then use components' names when specifying their order in the pipeline.
# Type is the class name of the component. 
components:
  - name: DocumentStore
    type: DeepsetCloudDocumentStore #this is the only supported document store in deepset Cloud
  - name: Retriever #selects the most relevant documents from the document store
    type: ElasticsearchRetriever #sparse retriever
    params:
      document_store: DocumentStore
      top_k: 20 #the number of results to return
  - name: FileTypeClassifier #routes files based on their extension to appropriate converters, by default txt, pdf, md, docx, html
    type: FileTypeClassifier
  - name: TextConverter #converts files into documents
    type: TextConverter
  - name: PDFConverter #converts PDFs into documents
    type: PDFToTextConverter
  - name: Preprocessor #splits documents into smaller ones and cleans them up
    type: PreProcessor
    params:
      # With a sparse retriever, you can keep slightly longer documents
      split_by: word #the unit by which you want to split the documents
      split_length: 500 #the max number of words in a document
      split_overlap: 30 #enables the sliding window approach
      split_respect_sentence_boundary: True #retains complete sentences in split documents

# Here you define how the nodes are organized in the pipelines
# For each node, specify its input
pipelines: 
  - name: query
    nodes:
      - name: Retriever
        inputs: [Query]
  - name: indexing
    nodes:
      - name: FileTypeClassifier
        inputs: [File]
      - name: TextConverter
        inputs: [FileTypeClassifier.output_1] #ensures that this converter gets txt files
      - name: PDFConverter
        inputs: [FileTypeClassifier.output_2] #ensures that this converter gets pdf files
      - name: Preprocessor
        inputs: [TextConverter, PDFConverter]
      - name: Retriever
        inputs: [Preprocessor]
      - name: DocumentStore
        inputs: [Retriever]
A sample hybrid document retrieval pipeline
This pipeline combines the advantages of a keyword-based retriever with an embedding-based retriever and uses the JoinDocuments node to combine their answers. It returns documents as answers.
# If you need help with the YAML format, have a look at https://docs.cloud.deepset.ai/docs/create-a-pipeline-using-a-yaml-file.
# This is a friendly editor that helps you create your pipelines with autosuggestions. To use them, press control + space on your keyboard.
# Whenever you need to specify a model, this editor also helps you. Just type your Hugging Face organization and a forward slash (/) to see available models.


# This is a document search pipeline that combines a dense (embedding-based) Retriever with a keyword-based Retriever (Elasticsearch's BM25)
version: '1.8.0'
name: 'HybridDocSearch'

# This section defines the nodes that you want to use in your pipelines. Each node must have a name and a type. You can also set the node's parameters here.
# The name is up to you, you can give your component a familliar name. You then use components' names when specifying their order in the pipeline.
# Type is the class name of the component. 
components:
  - name: DocumentStore
    type: DeepsetCloudDocumentStore #the only supported document store in deepset Cloud
  - name: ESRetriever # the keyword-based retriever
    type: ElasticsearchRetriever
    params:
      document_store: DocumentStore
      top_k: 20 #the number of results to return
  - name: EmbeddingRetriever # the dense retriever
    type: EmbeddingRetriever
    params:
      document_store: DocumentStore
      embedding_model: sentence-transformers/multi-qa-mpnet-base-dot-v1 #model optimized for semantic search
      model_format: sentence_transformers
      top_k: 20 #the number of results to return
  - name: JoinResults #joins the results from both retrievers
    type: JoinDocuments
    params:
      join_mode: reciprocal_rank_fusion #applies rank-based scoring to the results
  - name: FileTypeClassifier #routes files based on their extension to appropriate converters, useful if you have different file types
    type: FileTypeClassifier
  - name: TextConverter #converts files into documents
    type: TextConverter
  - name: PDFConverter #converts PDFs into documents
    type: PDFToTextConverter
  - name: Preprocessor #splits documents into smaller ones and cleans them up
    type: PreProcessor
    params:
      #With a dense retriever, it's good to split your documents into smaller ones
      split_by: word #the unit by which you want to split the documents
      split_length: 250 #the max number of words in a document
      split_overlap: 30 #enables the sliding window approach
      split_respect_sentence_boundary: True #retains complete sentences in split documents

# Here you define how the nodes are organized in the pipelines
# For each node, specify its input
pipelines:
  - name: query
    nodes:
      - name: ESRetriever
        inputs: [Query]
      - name: EmbeddingRetriever
        inputs: [Query]
      - name: JoinResults
        inputs: [ESRetriever, EmbeddingRetriever]
  - name: indexing
    nodes:
      # Depending on the file type, we use a Text or PDF converter
      - name: FileTypeClassifier
        inputs: [File]
      - name: TextConverter
        inputs: [FileTypeClassifier.output_1] #ensures that this converter gets txt files
      - name: PDFConverter
        inputs: [FileTypeClassifier.output_2] #ensures that this converter gets pdf files
      - name: Preprocessor
        inputs: [TextConverter, PDFConverter]
      - name: EmbeddingRetriever
        inputs: [Preprocessor]
      - name: DocumentStore
        inputs: [EmbeddingRetriever]
A sample German question-answering pipeline
This pipeline is a good starting point. It uses a dense EmbeddingRetriever and a German question-answering model. It highlights the answers within text passages thanks to the use of a reader.
# If you need help with the YAML format, have a look at https://docs.cloud.deepset.ai/docs/create-a-pipeline-using-a-yaml-file.
# This is a friendly editor that helps you create your pipelines with autosuggestions. To use them, press control + space on your keyboard.
# Whenever you need to specify a model, this editor also helps you. Just type your Hugging Face organization and a forward slash (/) to see available models.


# This is a default Question Answering pipeline for German with a dense, multilingual EmbeddingRetriever and a German QA model
version: '1.8.0'
name: 'QuestionAnswering_de'

# This section defines the nodes that you want to use in your pipelines. Each node must have a name and a type. You can also set the node's parameters here.
# The name is up to you, you can give your component a familiar name. You then use components' names when specifying their order in the pipeline.
# Type is the class name of the component. 
components:
  - name: DocumentStore
    type: DeepsetCloudDocumentStore #the only supported document store in deepset Cloud
    params:
      similarity: cosine #recommended for sentence transformer models
  - name: Retriever #selects the most relevant documents from the document store and then passes them on to the Reader
    type: EmbeddingRetriever #uses a Transformer model to encode the document and the query
    params:
      document_store: DocumentStore
      embedding_model: sentence-transformers/msmarco-distilbert-multilingual-en-de-v2-tmp-lng-aligned
      model_format: sentence_transformers
      top_k: 20 #the number of results to return
  # A "Reader" model that goes through those 20 candidate documents and identifies the exact answer
  - name: Reader #the component that actually fetches answers from the 20 documents returned by the retriever    
    type: FARMReader #Transformer-based reader, specializes in extractive QA
    params:
      model_name_or_path: deepset/gelectra-large-germanquad
      context_window_size: 700 #the size of the window around the answer span
  - name: FileTypeClassifier #routes files based on their extension to appropriate converters, by default txt, pdf, md, docx, html
    type: FileTypeClassifier
  - name: TextConverter #converts files into documents
    type: TextConverter
  - name: PDFConverter #converts PDFs into documents
    type: PDFToTextConverter
  - name: Preprocessor #splits documents into smaller ones and cleans them up
    type: PreProcessor
    params:
      #With a dense retriever, it's good to split your documents into smaller ones
      split_by: word #the unit by which you want to split the documents
      split_length: 250 #the max number of words in a document
      split_overlap: 50 #enables the sliding window approach
      split_respect_sentence_boundary: True #retains complete sentences in split documents

# Here you define how the nodes are organized in the pipelines
# For each node, specify its input
pipelines:
  - name: query
    nodes:
      - name: Retriever
        inputs: [Query]
      - name: Reader
        inputs: [Retriever]
  - name: indexing
    nodes:
      - name: FileTypeClassifier
        inputs: [File]
      - name: TextConverter
        inputs: [FileTypeClassifier.output_1] #ensures that this converter receives txt files
      - name: PDFConverter
        inputs: [FileTypeClassifier.output_2] #ensures that this converter receives PDFs
      - name: Preprocessor
        inputs: [TextConverter, PDFConverter]
      - name: Retriever
        inputs: [Preprocessor]
      - name: DocumentStore
        inputs: [Retriever]
A sample English question-answering pipeline

This is a good starting point for a question-answering system. It uses a reader which highlights the answers in text passages. The retriever is an embedding-based one.

# If you need help with the YAML format, have a look at https://docs.cloud.deepset.ai/docs/create-a-pipeline-using-a-yaml-file.
# This is a friendly editor that helps you create your pipelines with autosuggestions. To use them, press control + space on your keyboard.
# Whenever you need to specify a model, this editor also helps you. Just type your Hugging Face organization and a forward slash (/) to see available models.


# This is default Question Answering pipeline for English with a good embedding-based Retriever and a small, fast Reader
version: '1.8.0'
name: 'QuestionAnswering_en'

# This section defines the nodes that you want to use in your pipelines. Each node must have a name and a type. You can also set the node's parameters here.
# The name is up to you, you can give your component a familliar name. You then use components' names when specifying their order in the pipeline.
# Type is the class name of the component.
components:
  - name: DocumentStore
    type: DeepsetCloudDocumentStore #the only supported document store in deepset Cloud
  - name: Retriever #selects the most relevant documents from the document store and passes them on to the Reader
    type: EmbeddingRetriever #uses a Transformer model to encode the document and the query
    params:
      document_store: DocumentStore
      embedding_model: sentence-transformers/multi-qa-mpnet-base-dot-v1 #model optimized for semantic search
      model_format: sentence_transformers
      top_k: 20 #the number of results to return
  - name: Reader #the component that actually fetches answers from among the 20 documents returned by the retriever 
    type: FARMReader #Transformer-based reader, specializes in extractive QA
    params:
      model_name_or_path: deepset/roberta-base-squad2-distilled #an optimized variant of BERT, a strong all-round model
      context_window_size: 700 #the size of the window around the answer span
  - name: FileTypeClassifier #routes files based on their extension to appropriate converters, by default txt, pdf, md, docx, html
    type: FileTypeClassifier
  - name: TextConverter #converts files into documents
    type: TextConverter
  - name: PDFConverter #converts PDFs into documents
    type: PDFToTextConverter
  - name: Preprocessor #splits documents into smaller ones and cleans them up
    type: PreProcessor
    params:
      #With a dense retriever, it's good to split your documents into smaller ones
      split_by: word #the unit by which you want to split the documents
      split_length: 250 #the max number of words in a document
      split_overlap: 50 #enables the sliding window approach
      split_respect_sentence_boundary: True #retains complete sentences in split documents

# Here you define how the nodes are organized in the pipelines
# For each node, specify its input
pipelines:
  - name: query
    nodes:
      - name: Retriever
        inputs: [Query]
      - name: Reader
        inputs: [Retriever]
  - name: indexing
    nodes:
    # Depending on the file type, we use a Text or PDF converter
      - name: FileTypeClassifier
        inputs: [File]
      - name: TextConverter
        inputs: [FileTypeClassifier.output_1] #ensures that this converter receives txt files
      - name: PDFConverter
        inputs: [FileTypeClassifier.output_2] #ensures that this converter receives PDFs
      - name: Preprocessor
        inputs: [TextConverter, PDFConverter]
      - name: Retriever
        inputs: [Preprocessor]
      - name: DocumentStore
        inputs: [Retriever]

Search

  1. In deepset Cloud, click Search.
  2. Select the pipeline that you just deployed.
  3. Type your question and search for the answer. That’s it!

What To Do Next?

Now you can start experimenting with your pipeline to check if it's the best one for your use case. You can also invite other users to use it for search.


Did this page help you?