Nodes define how data flows through your pipeline. Some nodes have more than one type. For example, Retrievers can be keyword-based or vector-based. You can choose the type that best fits the task at hand. You can also specify parameters for your nodes to make them work exactly as you need.
When choosing a node for your pipeline, make sure it's optimal for the type of data you want to run your search on.
To combine two nodes in a pipeline, the output of the first node must be the same as the input of the next node. For example, as in the picture below, TextConverter takes Files as input and returns Documents as output. You can combine it with PreProcessor because it takes Documents as input, so the output and input of these two nodes are compatible.
When connecting the nodes, you pass the name of a compatible node as the input for the node that follows it. For example:
- name: Converter #here you give your node a name
- name: Processor
- name: indexing
- name: Converter
- name: Processor
inputs: [Converter] # this means Processor takes the output of Converter as its input
See a node's documentation page for information about compatible nodes. For instructions on creating pipelines, see Create a Pipeline.
These are the nodes you can use to perform tasks on your data in an indexing pipeline:
Extracts text and tables from PDF, JPEG, PNG, MBP, and TIFF files using Microsoft Azure Form Recognizer. You must have an active Azure account and a Form Recognizer or Cognitive Services resource to use it.
Extracts entities out of all documents in the Document Store and stores them in the documents' metadata.
Useful if you have different types of files, for example PDF and TXT. It classifies the files by type and then routes them to appropriate file converters which further prepare them for search.
Necessary in an indexing pipeline if you have TXT files. It converts them to Document objects that deepset Cloud pipelines search on.
Necessary in an indexing pipeline if you have PDF files. It converts the files to Document objects that deepset Cloud pipelines search on.
Cleans and splits Documents into smaller chunks to make Readers' and Retrievers' work easier and faster. Used after file converters.
- Vector-Based Retrievers
Vector-based retrievers in indexing pipelines calculate vector representations (embeddings) of Documents and store these embeddings in DocumentStore.
Click here to see a flowchart combining these nodes into an indexing pipeline
See also Data Preparation with Pipeline Nodes.
Here are all the nodes you can use in your query pipelines, grouped by their function.
Extracts entities from documents fetched by the Retriever and stores them in the documents' metadata.
Goes through the documents in the DocumentStore and fetches the ones that are most relevant to the query. You can use it on its own for document retrieval. It then returns whole documents as answers.
You can combine it with a Reader for question answering to highlight the answer in the document.
Prioritizes documents based on the criteria you specify. For example, you can prioritize the newest documents.
The core component that fetches the answers by highlighting them in the documents.
Adjusts the scores Ranker or Retriever assigned to the retrieved documents.
Click here to see a flowchart of an extractive question answering pipeline
Here's a basic extractive question answering pipeline:
Here's a RAG pipeline using a hybrid document search and a Ranker:
A very versatile node that can perform a variety of NLP tasks using an LLM. Some examples are generative question answering, summarization, translation, and more. It comes with a bunch of out-of-the-box prompts you can use for most common tasks.
Click here for a flowchart of a basic RAG pipeline with a PromptNode
Click here for a flowchart of a basic document search pipeline
Distinguishes between keyword queries and natural language queries and routes them to the node that can handle them best. For example, you can use it to route keyword queries to a keyword-based retriever, like BM25Retriever, and natural language queries to a vector-based retriever, like EmbeddingRetriever.
Click here to see a flowchart of a RAG pipeline with QueryClassifier in a query pipeline
In extractive question answering pipelines, used after the FARMReader to get rid of duplicate answers the Reader returns.
Combines the output of two or more retrievers. Useful if you want to use a keyword-based and a dense retriever in one pipeline.
Modifies values by renaming them or changing their type. Used with PromptNode to ensure it receives or outputs a specific value.
Used in retrieval-augmented generation (RAG) pipelines to predict references of the answers the LLM generates.
Attaches an error message to the answer's metadata and ends the pipeline. Frequently used in RAG pipelines as a branch where prompt injection attempts are redirected. On receiving a prompt injection, ReturnError stops the pipeline, ensuring the prompt never reaches the PromptNode.
Interlaves documents coming from different retrievers into a single list. Used for pre-filtering documents for labeling.
Click here for a flowchart of a RAG pipeline combining JoinDocuments and ReferencePredictor
Click here for a flowchart of a RAG pipeline with ReturnError
Click here for a flowchart of an extractive question answering pipeline with AnswerDeduplication
Updated 4 days ago