Pipeline Nodes

Nodes are the components that make up your pipeline. Choosing the right nodes for your pipeline is crucial to achieving the most relevant search results.

Nodes define how data flows through your pipeline. Some nodes have more than one type, for example, Retrievers or Readers, so you can choose the type that best fits the task at hand. You can also specify parameters for your nodes to make them work exactly as you need.

When choosing a node for your pipeline, make sure it's optimal for the type of data you want to run your search on.

Nodes Used in Indexing Pipelines

These are the nodes you can use to perform tasks on your data in an indexing pipeline:

  • FileTypeClassifier
    Useful if you have different types of files, for example PDF and TXT. It classifies the files by type and then routes them to appropriate file converters which further prepare them for search.
  • TextConverter
    Necessary in an indexing pipeline if you have TXT files. It converts them to Document objects that deepset Cloud pipelines search on.
  • PDFToTextConverter
    Necessary in an indexing pipeline if you have PDF files. It converts the files to Document objects that deepset Cloud pipelines search on.
  • PreProcessor
    Cleans and splits Documents into smaller chunks to make Readers' and Retrievers' work easier and faster. Used after file converters.

See also Data Preparation with Pipeline Nodes.

Nodes Used in Query Pipelines

Here are all the nodes you can use in your query pipelines, grouped by their function.

Semantic Search Nodes

  • Retriever
    Goes through the documents in the DocumentStore and fetches the ones that are most relevant to the query. You can use it on its own for document retrieval. It then returns whole documents as answers.
    You can combine it with a Reader for question answering to speed up the search.
  • Reader
    The core component that fetches the answers by highlighting them in the documents.

Nodes Using Large Language Models (LLMs)

  • PromptNode
    A very versatile node that can perform a variety of NLP tasks using an LLM. Some examples are generative question answering, summarization, translation, and more. It comes with a bunch of out-of-the-box prompts you can use for most common tasks.
  • AnswerGenerator
    Generates an answer based on the documents you provide to it. You can use it to combine information from multiple documents into a single answer.

Routing Nodes

  • QueryClassifier
    Distinguishes between keyword queries and natural language queries and routes them to the node that can handle them best. For example, you can use it to route keyword queries to a keyword-based retriever, like BM25Retriever, and natural language queries to a vector-based retriever, like DensePassageRetriever.

Utility Nodes

  • JoinDocuments
    Combines the output of two or more retrievers. Useful if you want to use a keyword-based and a dense retriever in one pipeline.