Pipeline Nodes
Nodes are the components that make up your pipeline. Choosing the right nodes for your pipeline is crucial to achieving the most relevant search results.
Nodes define how data flows through your pipeline. Some nodes have more than one type. For example, Retrievers can be keyword-based or vector-based. You can choose the type that best fits the task at hand. You can also specify parameters for your nodes to make them work exactly as you need.
When choosing a node for your pipeline, make sure it's optimal for the type of data you want to run your search on.
How Do The Nodes Combine?
To combine two nodes in a pipeline, the output of the first node must be the same as the input of the next node. For example, as in the picture below, TextConverter takes Files as input and returns Documents as output. You can combine it with PreProcessor because it takes Documents as input, so the output and input of these two nodes are compatible.
When connecting the nodes, you pass the name of a compatible node as the input for the node that follows it. For example:
components:
- name: Converter #here you give your node a name
type: TextConverter
- name: Processor
type: PreProcessor
params:
....
pipelines:
- name: indexing
nodes:
- name: Converter
inputs: [File]
- name: Processor
inputs: [Converter] # this means Processor takes the output of Converter as its input
See a node's documentation page for information about compatible nodes. For instructions on creating pipelines, see Create a Pipeline.
Nodes Used in Indexing Pipelines for Processing Data
These are the nodes you can use to perform tasks on your data in an indexing pipeline:
- CNAzureConverter
Extracts text and tables from PDF, JPEG, PNG, MBP, and TIFF files using Microsoft Azure Form Recognizer. You must have an active Azure account and a Form Recognizer or Cognitive Services resource to use it. - EntityExtractor
Extracts entities out of all documents in the Document Store and stores them in the documents' metadata. - FileTypeClassifier
Useful if you have different types of files, for example PDF and TXT. It classifies the files by type and then routes them to appropriate file converters which further prepare them for search. - TextConverter
Necessary in an indexing pipeline if you have TXT files. It converts them to Document objects that deepset Cloud pipelines search on. - PDFToTextConverter
Necessary in an indexing pipeline if you have PDF files. It converts the files to Document objects that deepset Cloud pipelines search on. - PreProcessor
Cleans and splits Documents into smaller chunks to make Readers' and Retrievers' work easier and faster. Used after file converters. - Vector-Based Retrievers
Vector-based retrievers in indexing pipelines calculate vector representations (embeddings) of Documents and store these embeddings in DocumentStore.
Click here to see a flowchart combining these nodes into an indexing pipeline
See also Data Preparation with Pipeline Nodes.
Nodes Used in Query Pipelines
Here are all the nodes you can use in your query pipelines, grouped by their function.
Semantic Search Nodes
- EntityExtractor
Extracts entities from documents fetched by the Retriever and stores them in the documents' metadata. - Retriever
Goes through the documents in the DocumentStore and fetches the ones that are most relevant to the query. You can use it on its own for document retrieval. It then returns whole documents as answers.
You can combine it with a Reader for question answering to highlight the answer in the document. - Ranker
Prioritizes documents based on the criteria you specify. For example, you can prioritize the newest documents. - Reader
The core component that fetches the answers by highlighting them in the documents. - RetrievalScoreAdjuster
Adjusts the scores Ranker or Retriever assigned to the retrieved documents.
Click here to see a flowchart of an extractive question answering pipeline
Extractive QA pipeline
Here's a basic extractive question answering pipeline:
RAG with a Ranker
Here's a RAG pipeline using a hybrid document search and a Ranker:
Nodes Using LLMs
- PromptNode
A very versatile node that can perform a variety of NLP tasks using an LLM. Some examples are retrieval augmented generation (RAG) question answering, summarization, translation, and more. It comes with a bunch of out-of-the-box prompts you can use for most common tasks.
Click here for a flowchart of a basic RAG pipeline with a PromptNode
Click here for a flowchart of a basic document search pipeline
Routing Nodes
- QueryClassifier
Distinguishes between keyword queries and natural language queries and routes them to the node that can handle them best. For example, you can use it to route keyword queries to a keyword-based retriever, like BM25Retriever, and natural language queries to a vector-based retriever, like EmbeddingRetriever.
Click here to see a flowchart of a RAG pipeline with QueryClassifier in a query pipeline
Utility Nodes
- AnswerDeduplication
In extractive question answering pipelines, used after the FARMReader to get rid of duplicate answers the Reader returns. - JoinDocuments
Combines the output of two or more retrievers. Useful if you want to use a keyword-based and a dense retriever in one pipeline. - Shaper
Modifies values by renaming them or changing their type. Used with PromptNode to ensure it receives or outputs a specific value. - ReferencePredictor
Used in retrieval-augmented generation (RAG) pipelines to predict references of the answers the LLM generates. - ReturnError
Attaches an error message to the answer's metadata and ends the pipeline. Frequently used in RAG pipelines as a branch where prompt injection attempts are redirected. On receiving a prompt injection, ReturnError stops the pipeline, ensuring the prompt never reaches the PromptNode. - InterleaveDocuments
Interlaves documents coming from different retrievers into a single list. Used for pre-filtering documents for labeling. - SnowflakeExecutor
Establishes a connection to a Snowflake database. This way, you can query your data in Snowflake with your deepset Cloud pipeline.
Click here for a flowchart of a RAG pipeline combining JoinDocuments and ReferencePredictor
Click here for a flowchart of a RAG pipeline with ReturnError
Click here for a flowchart of an extractive question answering pipeline with AnswerDeduplication
Updated about 2 months ago