Create a Pipeline

Create a pipeline from scratch or choose one of the available templates for an easy start. You can use the guided workflow, code editor, or REST API.

πŸ“˜

You must be an Admin to perform this task.

About This Task

Each pipeline file defines two pipelines:

  • An indexing pipeline that defines how your files are preprocessed. Whenever you add a file, it is preprocessed by all deployed pipelines.
  • A query pipeline that describes how the query is run.

There are multiple ways to create a pipeline:

  • Using the guided workflow: Choose this method if you're new to pipelines and you'd like us to guide you through the process of creating one. Just tell us what you want to do and we'll create a pipeline that matches your goals best. You'll be able to use it right away.
  • Using Pipeline Designer: Choose this method to create your own pipeline from one of the ready-made templates or from scratch using a code editor in Pipeline Designer. The editor comes with a pipeline preview that shows you the pipeline structure as a diagram. This makes it easier to understand how data flows through your pipeline.
  • Using API: Choose this way if you already have a pipeline YAML file and want to programmatically upload it to deepset Cloud.

Prerequisites

  • To learn about how pipelines and nodes work in deepset Cloud, see Pipeline Nodes and About Pipelines.
  • To use a hosted model, Connect to Model Providers first so that you don't have to pass the API key within the pipeline. For Hugging Face, this is only required for private models. Once deepset Cloud is connected to a model provider, just pass the model name in the model_name_or_path parameter of the node that uses it in the pipeline. deepset Cloud will download and load the model. For more information, see Language Models in deepset Cloud.

Create a Pipeline Using Guided Workflow

Go to the home page. You can access the guided workflow at the top of the page. Click Create Pipeline and follow the steps in the workflow.

The guided workflow text on the home page telling the user to create a pipeline in five easy steps.

Create a Pipeline Using Pipeline Designer

If you already know what your pipeline should look like or want to use one of the ready-made templates, that's the method for you. It's recommended that you have a basic understanding of YAML as that's the programming language in which you create pipelines.

Pipeline Format

Your pipeline definition file is in the YAML format. Make sure that you follow the same indentation structure as in this example:


components: #This part defines your pipeline components and their settings
  - name: MyPreprocessor #this is the name that you want to give to the pipeline node
    type: Preprocessor #this is the node type (class). For more information, see "Pipeline Nodes"
    params: #these are the node's settings
      split_by: passage
      split_length: 1
  - name: DocumentStore
    type: DeepsetCloudDocumentStore #currently only this document store type is supported
    # Continue until you define all components
 
#After you define all the components that you want to use, define your query and indexing pipelines:
pipelines:
  - name: query
    nodes: #here list the nodes that you want to use in this pipeline, each node must have a name and input
      - name: ESRetriever #this is the name of the node that you specified in the "components" section above
        inputs: [Query] #here you specify the input for this node, this is the name of the node that you specified in the components section
        #and you go on defining the nodes
        
    #Next, specify the indexing pipeline:
   - name: indexing
     nodes:
       - name: MyPreprocessor
         inputs: [File]
       - name: ESRetriever
       	 inputs: [MyPreprocessor]
         ...

Create a Pipeline

  1. Log in to deepset Cloud and go to Pipelines.

  2. Click Create Pipeline and choose if you want to create a pipeline from an empty file or use a template.
    There are pipeline templates available for different types of tasks. All of them work out of the box, but you can also use them as a starting point for your pipeline.

    1. If you chose an empty file, give your pipeline a name and click Create Pipeline. You're redirected to the Pipelines page. You can find your pipeline in the All tab. To edit the pipeline, click the More Actions menu next to it and choose Edit.

      The In Development section with the ellipsis button expanded and the Edit option showing.
    2. If you choose a template, you're redirected to the Pipeline Templates page.

      1. Choose a template that best matches your use case, hover over it, and click Use Template.

        Pipeline templates with a template selected
      2. Give your pipeline a name and click Create Pipeline. You land on the Pipelines page. Your pipeline is a draft, which you can find in the Drafts tab. You can now modify the pipeline or use it as it is.

      3. Depending on what you want to do:

        1. To modify the template, click the More Actions menu next to your pipeline and choose Edit. You're redirected to the Designer, where you can edit and save your pipeline. Follow the instructions in step 3 below.
        2. To use the template as is, go directly to Step 4 below.
  3. (Optional) To modify the template in Designer:

      1. In the components section of the file, configure all the nodes you want to use for indexing and query pipelines. Each node should have the following parameters:
      2. name - This is a custom name you give to the node.
      3. type - This is the node's class. You can check it in Pipeline Nodes if you're unsure.
      4. params - This section is the node's configuration. It lists the parameters for the node and their settings. If you don't configure any parameters, the node uses its default settings for the mandatory parameters. Here's an example:
        components:
        	- name: Retriever
          	type: EmbeddingRetriever
            params: 
            	document_store: DocumentStore
              embedding_model: intfloat/e5-base-v2
              model_format: sentence_transformers
              top_k: 10
        
    1. In the pipelines section, define your query and indexing pipelines:
      1. For the query pipeline, set the name to query.
      2. For the indexing pipeline, set the name to indexing.
      3. For each pipeline, add the nodes section to define the order of the nodes in your pipeline. Each node has a name (that's the custom name you gave it in the components section) and inputs (that's the name of the nodes whose input it takes for further processing. It can be one or more nodes.
        The input of the first node in the indexing pipeline is always File.
        The input of the first node in the query pipeline is always Query.
        Example:
        Tip: Use ctrl + space to see autosuggestions. To see a list of available models, type the Hugging Face organization + / (slash).
        Tip: To revert your changes to the last saved version, click Reset.
        pipelines:
        	- name: query
          	nodes:
            	- name: BM25Retriever
                inputs: [Query] #Query is always the input of the first node in a query pipeline
              - name: EmbeddingRetriever
                inputs: [Query]
              - name: JoinResults
                inputs: [BM25Retriever, EmbeddingRetriever]
              - name: Reranker
                inputs: [JoinResults]
              - name: PromptNode
                inputs: [Reranker]
           - name: indexing
             nodes:
             	- name: FileTypeClassifier
                inputs: [File]
              - name: TextConverter
                inputs: [FileTypeClassifier.output_1] 
              - name: PDFConverter
                inputs: [FileTypeClassifier.output_2] 
              - name: Preprocessor
                inputs: [TextConverter, PDFConverter]
              - name: EmbeddingRetriever
                inputs: [Preprocessor]
              - name: DocumentStore
                inputs: [EmbeddingRetriever] 
        
    2. Save your pipeline. deepset Cloud validates if your pipeline design is correct.
  4. To use your pipeline, you must first deploy it. Click Deploy next to the pipeline on the Pipelines page or in the top right corner of the Designer. This triggers indexing.

  5. To test your pipeline, wait until it's indexed and then go to Playground. Make sure your pipeline is selected, and type your query.

An explained example of a pipeline

First, define the components that you want to use in your pipelines. For each component, specify its name, type, and any parameters that you want to use.

After you define your components, define your pipelines. For each pipeline, specify its name, type (either Query or Indexing), and the nodes that it consists of. For each node, specify its input.


components:   # This section defines nodes that we want to use in our pipelines
  - name: DocumentStore
    type: DeepsetCloudDocumentStore # This is the only supported document store
  - name: Retriever # Selects the most relevant documents from the document store and then passes them on to the Reader
    type: EmbeddingRetriever # Uses a Transformer model to encode the document and the query
    params:
      document_store: DocumentStore
      embedding_model: intfloat/e5-base-v2 # Model optimized for semantic search
      model_format: sentence_transformers
      top_k: 20 # The number of results to return
  - name: Reader # The component that actually fetches answers            
    type: FARMReader # Transformer-based reader, specializes in extractive QA
    params:
      model_name_or_path: deepset/roberta-large-squad2 # An optimized variant of BERT, a strong all-round model
      context_window_size: 700 # The size of the window around the answer span
  - name: TextFileConverter # Converts files to documents
    type: TextConverter
  - name: Preprocessor # Splits documents into smaller ones, and cleans them up
    type: PreProcessor
    params:
      split_by: word # The unit by which you want to split your documents
      split_length: 250 # The maximum number of words in a document
      split_overlap: 30 # Enables the sliding window approach
      split_respect_sentence_boundary: True # Retains complete sentences in split documents
      language: en # Used by NLTK to best detect the sentence boundaries for that language

pipelines: # Here you define the pipelines. For each component, specify its input.
  - name: query 
    nodes:
      - name: Retriever
        inputs: [Query] # The input for the first node is always a query
      - name: Reader
        inputs: [Retriever] # Input is the name of the component that you defined in the "component" section
  - name: indexing
    nodes:
      - name: TextFileConverter
        inputs: [File]
      - name: Preprocessor
        inputs: [TextFileConverter]
      - name: Retriever
        inputs: [Preprocessor]
      - name: DocumentStore
        inputs: [Retriever]

Create a Pipeline with REST API

This method works well if you have a pipeline YAML ready and want to upload it to deepset Cloud. You need to Generate an API Key first.

Follow the step-by-step code explanation:

Or use the following code:

curl --request POST \
     --url https://api.cloud.deepset.ai/api/v1/workspaces/<YOUR_WORKSPACE>/pipelines \
     --header 'Accept: application/json' \
     --header 'Authorization: Bearer <YOUR_API_KEY>'\
     --data-binary "@path/to/pipeline.yaml"

See the REST API endpoint documentation.

What To Do Next

  • If you want to use your newly created pipeline for search, you must deploy it.
  • To view pipeline details, such as statistics or feedback, click the pipeline name. This opens the Pipeline Details page.
  • To let others test your pipeline, share your pipeline prototype.