PromptNode

PromptNode is an easy-to-use, customizable node that brings you the power of large language models. You can use it in your pipelines for various NLP tasks.

With PromptNode, you can use large language models directly in your pipelines.

What are large language models?

Large language models are huge models trained on enormous amounts of data. Interacting with such a model resembles talking to another person. These models have general knowledge of the world. You can ask them anything, and they'll be able to answer.

Large language models are trained to perform many NLP tasks with little training data. What's astonishing about them is that a single model can perform various NLP tasks with good accuracy.

Some examples of large language models include flan-t5-base, flan-paLM, chinchilla, and GPT-3 variants, such as text-davinci-003.

Basic Information

PromptNode is a very versatile node. It's used in query pipelines, but its position depends on what you want it to do. You can pass a prompt template to specify the NLP task the PromptNode should perform, and a model to use. For more information, see the Usage section.

  • Pipeline type: Used in query pipelines.
  • Position in a pipeline: The position depends on the NLP task you want it to do. See Usage for examples.
  • Input and Output: Depends on the NLP task it performs. Some examples are query, documents, and the output of the preceding node. You define the input in the PromptTemplate you pass to PromptNode. If you're using one of the ready-made PromptTemplates, here's the input and output they take:
Prompt TemplateInputOutput
question-answeringdocuments
Type: List or string
questions
Type: List or string
answer
question-generationdocuments
Type: List or string
question
conditioned-question-generationdocuments
Type: List or string
answers
Type: List or string
question
summarizationdocuments
Type: List or string
summary
question-answering-checkdocuments
Type: List or string
questions
Type: List or string
answer
sentiment-analysisdocuments
Type: List or string
answer
multiple-choice-question-answeringquestions
Type: List or string
options
Type: List or list of lists
answer
topic-classificationoptions
Type: List or list of lists
documents
Type: List or string
answer
language-detectiondocuments
Type: List or string
answer
translationtarget_language
Type: List or string
documents
Type: List or string
translation

The output is usually a string, but in the prompt, you can tell the model to generate a specific output type.

  • Available classes: PromptNode

Usage

You can use PromptNode as a stand-alone node or in a pipeline. If you don't specify the model you want to use for the node, it uses flan t5 base.

Stand Alone

You can run PromptNode in your SDK. Just initialize the node and ask a question. The model has general knowledge about the world, so you can ask it anything.

from haystack.nodes import PromptNode

# Initialize the node:
prompt_node = PromptNode()

# Run a prompt
prompt_node("What is the capital of Germany?")

# Here's the output:
['berlin']

With a Prompt Template

PromptNode comes with out-of-the-box PromptTemplates. The templates contain instructions for the node to perform some of the most common NLP tasks. For better results, specify the template you want PromptNode to use. You can pass additional variables, like documents or questions, to the node. The template combines all inputs into a single prompt:

from haystack.nodes import PromptNode, PromptTemplate

# Initalize the node
prompt_node = PromptNode()

# Specify the template using the `prompt` method 
# and pass your documents and questions:
prompt_node.prompt(prompt_template="question-answering", 
          documents=["Berlin is the capital of Germany.", "Paris is the capital of France."],
          questions=["What is the capital of Germany?", "What is the capital of France"])

# Here's the output:
['Berlin', 'Paris']

To explore the real power of templates, see the Templates section.

With a Model Specified

By default, PromptNode uses the flan t5 base model. You can also use other google/flan-t5 models and the davinci model by OpenAI.

from haystack.nodes import PromptNode

# Initalize the node passing the model:
prompt_node = PromptNode(model_name_or_path="google/flan-t5-xl")

# Go ahead and ask a question:
prompt_node("What is the best city in Europe to live in?")

In a Pipeline

The real power of PromptNode shows when you use it in a pipeline. Look at the example to get an idea of what's possible.

PromptNode and Shaper

When used in a pipeline, PromptNode often requires Shaper to make sure the input of the preceding node is what PromptNode expects. Or to make sure the output of PromptNode is what the next node expects. Shaper has out-of-the-box functions you can use to modify the input and output of PromptNode's PromptTemplate.

To understand which Shaper function to use, let's look at how PromptNode and PromptTemplates work. When a PromptTemplate takes one parameter, PromptNode simply processes this parameter. If the parameter is a list of strings, PromptNode processes the list entries one by one. An example of such a template is summarization or language_detection.

When a PromptTemplate takes more than one parameter, it gets a bit more complicated. If each parameter is a single string, they simply get injected into the PromptTemplate and PromptNode executes them. For example, in the question-answering template, you may pass one question and one document. The PromptNode then takes the question and runs it against the document. But if the parameters are lists, PromptNode then processes the first item from the first list against the first item from the second list, it then moves on to the second item from the first list, against the second item from the second list. It goes on like that until the shorter list finishes and then it stops. The remaining items from the longer list remain unprocessed.

An image showing that each list item is processed against one list item from the other list and then promptnode arrives at an asnwer. When the shorter list finishes, the remaining items on the longer list remain unprocessed.

For example, in the question-answering template, if you pass a question as a string and a list of 5 documents, PromptNode will run the question against the first document and it will stop.

An image showing the the question is only processed against the first document and after that prompt node arrives at an answer leaving all the other documents unprocessed.

To make it run through the question through all documents, you must use Shaper. You can do it in two ways:

  • Using Shaper's value_to_list function. This changes the question into a list, where the question is repeated 5 times (because you have 5 documents). PromptNode then takes the first occurrence of the question in the list and runs it against the first document, then takes the second occurrence of the list and runs it against the second document, and so on. So it's just a trick that repeats the question as many times as there are documents to make sure PromptNode searches for the question in each document.
  • Using Shaper's join_documents function. This function joins all the documents into one large document. PromptNode then runs the question against this document.

PromptNode always processes parameters this way. So if you're passing a list in at least one of your parameters, use an appropriate Shaper function to control how PromptNode processes this list.

Examples

Long-Form Question Answering

Long-form QA is one use of the PromptNode, but certainly not the only one. In this QA type, PromptNode handles complex questions by synthesizing information from various documents to retrieve an answer.

from haystack.pipelines import Pipeline
from haystack.nodes import Shaper, PromptNode, PromptTemplate
from haystack.schema import Document

# Let's create a custom LFQA prompt using PromptTemplate
lfqa_prompt = PromptTemplate(name="lfqa",
                             prompt_text="""Synthesize a comprehensive answer from the following topk most relevant paragraphs and the given question. 
                             Provide a clear and concise response that summarizes the key points and information presented in the paragraphs. 
                             Your answer should be in your own words and be no longer than 50 words. 
                             \n\n Paragraphs: $documents \n\n Question: $query \n\n Answer:""") 

# These docs could also come from a retriever
# Here we explicitly specify them to avoid the setup steps for Retriever and DocumentStore
doc_1 = "Contrails are a manmade type of cirrus cloud formed when water vapor from the exhaust of a jet engine condenses on particles, which come from either the surrounding air or the exhaust itself, and freezes, leaving behind a visible trail. The exhaust can also trigger the formation of cirrus by providing ice nuclei when there is an insufficient naturally-occurring supply in the atmosphere. One of the environmental impacts of aviation is that persistent contrails can form into large mats of cirrus, and increased air traffic has been implicated as one possible cause of the increasing frequency and amount of cirrus in Earth's atmosphere."
doc_2 = "Because the aviation industry is especially sensitive to the weather, accurate weather forecasting is essential. Fog or exceptionally low ceilings can prevent many aircraft from landing and taking off. Turbulence and icing are also significant in-flight hazards. Thunderstorms are a problem for all aircraft because of severe turbulence due to their updrafts and outflow boundaries, icing due to the heavy precipitation, as well as large hail, strong winds, and lightning, all of which can cause severe damage to an aircraft in flight. Volcanic ash is also a significant problem for aviation, as aircraft can lose engine power within ash clouds. On a day-to-day basis airliners are routed to take advantage of the jet stream tailwind to improve fuel efficiency. Aircrews are briefed prior to takeoff on the conditions to expect en route and at their destination. Additionally, airports often change which runway is being used to take advantage of a headwind. This reduces the distance required for takeoff, and eliminates potential crosswinds."

# Shaper concatenates the most relevant docs into one doc used as context for the generated answer
shaper = Shaper(func="join_documents", inputs={"documents": "documents"}, outputs=["documents"])

# Let's initiate the PromptNode 
node = PromptNode("text-davinci-003", default_prompt_template=lfqa_prompt, api_key=api_key)

# Let's create a pipeline with Shaper and PromptNode
pipe = Pipeline()
pipe.add_node(component=shaper, name="shaper", inputs=["Query"])
pipe.add_node(component=node, name="prompt_node", inputs=["shaper"])

output = pipe.run(query="Why do airplanes leave contrails in the sky?", documents=[Document(doc_1), Document(doc_2)])
output["results"]

# Here's the answer:
["Contrails are manmade clouds formed when water vapor from the exhaust of a jet engine condenses on particles, which come from either the surrounding air or the exhaust itself, and freezes, creating a visible trail. Increased air traffic has been linked to the greater frequency and amount of these cirrus clouds in Earth's atmosphere."]
version: 1.14.0
name: LongFormQA

components: 
#what about the documents? do we need to configure the docstore?
 - name: Shaper
   type: Shaper
   params:
    func: join_documents
    inputs:
     - documents
    outputs:
     - documents
 - name: PromptNode
   type: PromptNode
   params:
    default_prompt_template: lfqa_prompt
    model_name_or_path: text-davinci-003
    api_key: api_key
 - name: FileTypeClassifier # Routes files based on their extension to appropriate converters, by default txt, pdf, md, docx, html
   type: FileTypeClassifier
 - name: TextConverter # Converts files into documents
   type: TextConverter
 - name: PDFConverter # Converts PDFs into documents
   type: PDFToTextConverter
 - name: Preprocessor # Splits documents into smaller ones and cleans them up
   type: PreProcessor
   params:
     split_by: word # The unit by which you want to split the documents
     split_length: 250 # The max number of words in a document
     split_overlap: 20 # Enables the sliding window approach
     language: en
     split_respect_sentence_boundary: True 
     
 pipelines:
  - name: query
    nodes:
     - name: Shaper
       inputs: [Query]
     - name: PromptNode
       inputs: [Shaper]
   - name: indexing
     nodes: 
      - name: FileTypeClassifier
        inputs: [File]
      - name: TextConverter
        inputs: [FileTypeClassifier.output_1] # Ensures that this converter receives txt files
      - name: PDFConverter
        inputs: [FileTypeClassifier.output_2] # Ensures that this converter receives PDFs
      - name: Preprocessor
        inputs: [TextConverter, PDFConverter]
      - name: Retriever
        inputs: [Preprocessor]
      - name: DocumentStore
        inputs: [Retriever]

To learn more about the template structure, see the Templates section below.

Arguments

Use these parameters to configure the PromptNode:

ParameterTypePossible ValuesDescription
model_name_or_pathStringDifferent sizes of the flan t5 model or a model by Open AI.
Default: google/flan-t5-base
The name of the model you want to use with the PromptNode.
Mandatory.
default_prompt_templateStringAny of the out-of-the-box templates or a template you created.
The out-of-the-box templates are:
- question-answering
- question-generation
- summarization
- conditioned-question-generation
- question-answering-check
- sentiment-analysis
- topic-classification
- multiple-choice-question-answering
- language-detection
- translation
The prompt template you want to use with the PromptNode. The template contains instructions for the model. If you don't specify it, the model tries to guess what task you want it to do based on your query. For best results, we recommend specifying the template.
Optional.
output_variableString-The name of the output variable in which you want to store the inference results.
Optional.
max_lengthIntegerDefault: 100The maximum length of the text output the PromptNode generates.
Optional.
api_keyString-The API key for the model. Specify it to use a model by Open AI.
Optional.
use_auth_tokenString-The Hugging Face authentication token for your private model.
Optional.
use_gpuBooleanTrue/FalseSpecifies if you want to use GPU when running PromptNode.
Optional.
devicesList of strings-The list of torch devices to use.
Optional.
stop_wordsList of strings-If the PromptNode encounters any of the words you specify here, it stops generating text.
Optional.
top_kIntegerDefault: 1The number of answers (generated texts) you want PromptNode to return.
Mandatory.
model_kwargsDictionary-Any additional keyword arguments you want to pass to the model.
Optional.

Prompt Templates

PromptNode comes with out-of-the-box prompt templates ready for you to use. A prompt template corresponds to an NLP task. Each template contains the prompt text, which is the instruction for the model. Prompt text may contain variables that get filled in with actual values at runtime. Here are the templates currently available for the PromptNode:

question-answering
PromptTemplate(
            name="question-answering",
            prompt_text="Given the context please answer the question. Context: $documents; Question: "
            "$questions; Answer:",
        )
question-generation
PromptTemplate(
            name="question-generation",
            prompt_text="Given the context please generate a question. Context: $documents; Question:",
        )
summarization
PromptTemplate(name="summarization", prompt_text="Summarize this document: $documents Summary:")
conditioned-question-generation
PromptTemplate(
            name="conditioned-question-generation",
            prompt_text="Please come up with a question for the given context and the answer. "
            "Context: $documents; Answer: $answers; Question:",
        )
question-answering-check
PromptTemplate(
            name="question-answering-check",
            prompt_text="Does the following context contain the answer to the question? "
            "Context: $documents; Question: $questions; Please answer yes or no! Answer:",
        )
sentiment-analysis
PromptTemplate(
            name="sentiment-analysis",
            prompt_text="Please give a sentiment for this context. Answer with positive, "
            "negative or neutral. Context: $documents; Answer:",
        )
topic-classification
PromptTemplate(
            name="topic-classification",
            prompt_text="Categories: $options; What category best describes: $documents; Answer:",
        )
multiple-choice-question-answering
PromptTemplate(
            name="multiple-choice-question-answering",
            prompt_text="Question:$questions ; Choose the most suitable option to answer the above question. "
            "Options: $options; Answer:",
        )
language-detection
PromptTemplate(
            name="language-detection",
            prompt_text="Detect the language in the following context and answer with the "
            "name of the language. Context: $documents; Answer:",
        )
translation
PromptTemplate(
            name="translation",
            prompt_text="Translate the following context to $target_language. Context: $documents; Translation:",
        )

If you don't specify the template, the node tries to guess what task you want it to perform. By indicating the template, you ensure it performs the right task.

Adding a New Template

You can also create your own template. Follow this structure:

from haystack.nodes import PromptTemplate, PromptNode

# In `prompt_text`, tell the model what you want it to do.
PromptNode.add_prompt_template(
    PromptTemplate(
        name="a meaningful template name"
        prompt_text="Instructions for the model. You can add variables here.'
    )
)

The prompt_text parameter contains the prompt template text for the task you want the model to do. It also specifies input variables. At runtime, these variables must be present in the execution context of the node.

When specifying parameters for your template, remember how PromptNode processes them. If there's more than one parameter and one of the parameters is a list, PromptNode processes the first item from the list against the second parameter. You may need Shaper to use PromptNode in a pipeline. For more information, see PromptNode and Shaper.

Setting a Default Template

You can set a default template for a PromptNode instance. This way, you can reuse the same PromptNode in your pipeline for different tasks:

from haystack.nodes import PromptTemplate, PromptNode
from haystack.schema import Document

prompt_node = PromptNode()
sa = prompt_node.set_default_prompt_template("sentiment-analysis-new")
sa(documents=[Document("I am in love and I feel great!")])

# Node output:
['positive']

# You can then switch to another template:
summarizer = sa.set_default_prompt_template("summarization")

Models

The default model for PromptModel and PromptNode is google/flan-t5-base but you can use any other LLM model. To do this, specify the model's name and the API key.

Using Another Model

You can replace the default model with a flan t5 model of a different size or a model by OpenAI.
This example uses a version of the GPT-3 model:

from haystack.nodes import PromptModel, PromptNode

openai_api_key = "<type your OpenAI API key>"

# Specify the model you want to use:
prompt_open_ai = PromptModel(model_name_or_path="text-davinci-003", api_key=openai_api_key)

# Make PromptNode use the model:
pn_open_ai = PromptNode(prompt_open_ai)

pn_open_ai("What's the coolest city to live in Germany?")
components:
 - name: PromptNode
   type: PromptNode
   params:
    default_prompt_template: question-answering
    model_name_or_path: text-davinci-003
    api_key: my_openai_key

Using Different Models in One Pipeline

You can also specify different LLMs for each PromptNode in your pipeline.

from haystack.nodes. import PromptTemplate, PromptNode, PromptModel
from haystack.pipelines import Pipeline
 
api_key = "<type your OpenAI API key>"

# Specify the model you want to use:
prompt_open_ai = PromptModel(model_name_or_path="text-davinci-003", api_key=api_key)

# This sets up the default model:
prompt_model = PromptModel()

# Now let make one PromptNode use the default model and the other one the OpenAI model:
node_default_model = PromptNode(prompt_model, default_prompt_template="question-generation", output_variable="questions")
node_openai = PromptNode(prompt_open_ai, default_prompt_template="question-answering")

pipeline = Pipeline()
pipeline.add_node(component=node_default_model, name="prompt_node1", inputs=["Query"])
pipe.add_node(component=node_openai, name="prompt_node_2", inputs=["prompt_node1"])
output = pipe.run(query="not relevant", documents=[Document("Berlin is the capital of Germany")])
output["results"]
# In YAML, you simply specify two PromptNodes, each with a different name and a different model
# Bear in mind that this example is not a complete pipeline, you'd still need to create the indexing pipeline
# and define its components

#what about documents??

components:
 - name: PromptNodeOpenAI 
   type: PromptNode
   params:
    default_prompt_template: question-answering
    model_name_or_path: text-davinci-003
    api_key: my_openai_key
 - name: PromptNodeDefault
   type: PromptNode
   params:
    default_prompt_template: question-generation
    model_name_or_path: google/flan-t5-large

# And now you could put the two nodes together in the query pipeline:
pipelines:
 - name: query
   nodes:
    - name: PromptNodeDefault
      inputs: [Query]
    - name: PromptNodeOpenAI
      inputs: [PromptNodeDefault]

Related Links