AnswerGenerator
The AnswerGenerator generates a novel text as an answer to your query. It does so based on the documents you feed to it.
While extractive question answering highlights a span of text as an answer, AnswerGenerator generates a completely new text. It composes this text based on the knowledge it gained during the pretraining and the documents it got from the retriever.
Basic Information
- Pipeline type: Used in query pipelines
- Position in a pipeline: After the retriever. You can use it as a substitute for the reader.
- Input: Query and Documents
- Output: Answer
- Available Classes: OpenAIAnswerGenerator
Usage
Initializing the Node
Here's the code to initialize OpenAIAnswerGenerator
:
# Here's how you configure the node in YAML:
components:
- name: AnswerGenerator
type: OpenAIAnswerGenerator
params:
api_key: my_api_key
from haystack.nodes import OpenAIAnswerGenerator
generator = OpenAIAnswerGenerator(api_key=MY_API_KEY)
To initialize Seq2SeqGenerator
, run:
# Here how you configure it in YAML:
components:
- name: AnswerGenerator
type: Seq2SeqGenerator
params:
model_name_or_path: your_locally_hosted_model
from haystack.nodes import Seq2SeqGenerator
generator = Seq2SeqGenerator(model_name_or_path="vblagoje/bart_lfqa")
In a Pipeline
This is an example of how you can use OpenAIAnswerGenerator in a pipeline:
# This example just contains the query pipeline part.
# Normally, you'd also need to specify the indexing pipeline and its components here
components:
# In YAML, you must set up the DocumentStore and a Retriever to fetch the documents
- name: DocumentStore
type: DeepsetCloudDocumentStore
- name: Retriever # Selects the most relevant documents from the document store so that the OpenAI model can base it's generation on it.
type: EmbeddingRetriever # Uses a Transformer model to encode the document and the query
params:
document_store: DocumentStore
embedding_model: sentence-transformers/multi-qa-mpnet-base-dot-v1 # Model optimized for semantic search
model_format: sentence_transformers
top_k: 3 # The number of documents to return
- name: AnswerGenerator # Generates candidate answers based on the documents it gets from the retriever
type: OpenAIAnswerGenerator
params:
model: text-davinci-003
api_key: your_openai_api_key # You can also set the api key in the Connections tab, then you don't need to add it here
max_tokens: 50 # The maximum number of tokens allowed for each generated Answer.
temperature: 0.9 # Determines the randomness of the model. Higher values mean the model will take more risk
presence_penalty: 0.1 # Positive values penalize new tokens based on whether they have already appeared in the text.
top_k: 3 # The number of results to return
# Here you define how the nodes are organized in the pipelines
# For each node, specify its input
pipelines:
- name: query
nodes:
- name: Retriever
inputs: [Query]
- name: AnswerGen
inputs: [Retriever]
- name: indexing
# Here comes the indexing pipeline
from haystack.pipelines import Pipeline
from haystack.nodes import OpenAIAnswerGenerator
from haystack.schema import Document
# These docs could also come from a retriever
# Here we explicitly specify them to avoid the setup steps for Retriever and DocumentStore
doc_1 = "Contrails are a manmade type of cirrus cloud formed when water vapor from the exhaust of a jet engine condenses on particles, which come from either the surrounding air or the exhaust itself, and freezes, leaving behind a visible trail. The exhaust can also trigger the formation of cirrus by providing ice nuclei when there is an insufficient naturally-occurring supply in the atmosphere. One of the environmental impacts of aviation is that persistent contrails can form into large mats of cirrus, and increased air traffic has been implicated as one possible cause of the increasing frequency and amount of cirrus in Earth's atmosphere."
doc_2 = "Because the aviation industry is especially sensitive to the weather, accurate weather forecasting is essential. Fog or exceptionally low ceilings can prevent many aircraft from landing and taking off. Turbulence and icing are also significant in-flight hazards. Thunderstorms are a problem for all aircraft because of severe turbulence due to their updrafts and outflow boundaries, icing due to the heavy precipitation, as well as large hail, strong winds, and lightning, all of which can cause severe damage to an aircraft in flight. Volcanic ash is also a significant problem for aviation, as aircraft can lose engine power within ash clouds. On a day-to-day basis airliners are routed to take advantage of the jet stream tailwind to improve fuel efficiency. Aircrews are briefed prior to takeoff on the conditions to expect en route and at their destination. Additionally, airports often change which runway is being used to take advantage of a headwind. This reduces the distance required for takeoff, and eliminates potential crosswinds."
# Let's initiate the OpenAIAnswerGenerator
node = OpenAIAnswerGenerator(
api_key=api_key,
model="text-davinci-003",
max_tokens=50,
presence_penalty=0.1,
frequency_penalty=0.1,
top_k=3,
temperature=0.9
)
# Let's create a pipeline with OpenAIAnswerGenerator
pipe = Pipeline()
pipe.add_node(component=node, name="prompt_node", inputs=["Query"])
output = pipe.run(query="Why do airplanes leave contrails in the sky?", documents=[Document(doc_1), Document(doc_2)])
output["answers"]
# Printed results
[<Answer {'answer': ' Contrails are created when water vapor from the exhaust of a jet engine condenses on particles, which come from either the surrounding air or the exhaust itself, and freezes, leaving behind a visible trail.', 'type': 'generative', 'score': None, 'context': None, 'offsets_in_document': None, 'offsets_in_context': None, 'document_id': None, 'meta': {'doc_ids': ['6a371f0bbb37c291befaaaf4704dc694', '2a2f7c49e1bec7864dd4bb447d5d0bfa'], 'doc_scores': [None, None], 'content': ["Contrails are a manmade type of cirrus cloud formed when water vapor from the exhaust of a jet engine condenses on particles, which come from either the surrounding air or the exhaust itself, and freezes, leaving behind a visible trail. The exhaust can also trigger the formation of cirrus by providing ice nuclei when there is an insufficient naturally-occurring supply in the atmosphere. One of the environmental impacts of aviation is that persistent contrails can form into large mats of cirrus, and increased air traffic has been implicated as one possible cause of the increasing frequency and amount of cirrus in Earth's atmosphere.", 'Because the aviation industry is especially sensitive to the weather, accurate weather forecasting is essential. Fog or exceptionally low ceilings can prevent many aircraft from landing and taking off. Turbulence and icing are also significant in-flight hazards. Thunderstorms are a problem for all aircraft because of severe turbulence due to their updrafts and outflow boundaries, icing due to the heavy precipitation, as well as large hail, strong winds, and lightning, all of which can cause severe damage to an aircraft in flight. Volcanic ash is also a significant problem for aviation, as aircraft can lose engine power within ash clouds. On a day-to-day basis airliners are routed to take advantage of the jet stream tailwind to improve fuel efficiency. Aircrews are briefed prior to takeoff on the conditions to expect en route and at their destination. Additionally, airports often change which runway is being used to take advantage of a headwind. This reduces the distance required for takeoff, and eliminates potential crosswinds.'], 'titles': ['', '']}}>,
<Answer {'answer': ' Airplanes leave contrails in the sky because water vapor from the exhaust of a jet engine condenses on particles, which come from either the surrounding air or the exhaust itself, and freezes, leaving behind a visible trail.', 'type': 'generative', 'score': None, 'context': None, 'offsets_in_document': None, 'offsets_in_context': None, 'document_id': None, 'meta': {'doc_ids': ['6a371f0bbb37c291befaaaf4704dc694', '2a2f7c49e1bec7864dd4bb447d5d0bfa'], 'doc_scores': [None, None], 'content': ["Contrails are a manmade type of cirrus cloud formed when water vapor from the exhaust of a jet engine condenses on particles, which come from either the surrounding air or the exhaust itself, and freezes, leaving behind a visible trail. The exhaust can also trigger the formation of cirrus by providing ice nuclei when there is an insufficient naturally-occurring supply in the atmosphere. One of the environmental impacts of aviation is that persistent contrails can form into large mats of cirrus, and increased air traffic has been implicated as one possible cause of the increasing frequency and amount of cirrus in Earth's atmosphere.", 'Because the aviation industry is especially sensitive to the weather, accurate weather forecasting is essential. Fog or exceptionally low ceilings can prevent many aircraft from landing and taking off. Turbulence and icing are also significant in-flight hazards. Thunderstorms are a problem for all aircraft because of severe turbulence due to their updrafts and outflow boundaries, icing due to the heavy precipitation, as well as large hail, strong winds, and lightning, all of which can cause severe damage to an aircraft in flight. Volcanic ash is also a significant problem for aviation, as aircraft can lose engine power within ash clouds. On a day-to-day basis airliners are routed to take advantage of the jet stream tailwind to improve fuel efficiency. Aircrews are briefed prior to takeoff on the conditions to expect en route and at their destination. Additionally, airports often change which runway is being used to take advantage of a headwind. This reduces the distance required for takeoff, and eliminates potential crosswinds.'], 'titles': ['', '']}}>,
<Answer {'answer': ' Contrails are formed when water vapor from the exhaust of a jet engine condenses on particles, which come from either the surrounding air or the exhaust itself, and freezes, leaving behind a visible trail.', 'type': 'generative', 'score': None, 'context': None, 'offsets_in_document': None, 'offsets_in_context': None, 'document_id': None, 'meta': {'doc_ids': ['6a371f0bbb37c291befaaaf4704dc694', '2a2f7c49e1bec7864dd4bb447d5d0bfa'], 'doc_scores': [None, None], 'content': ["Contrails are a manmade type of cirrus cloud formed when water vapor from the exhaust of a jet engine condenses on particles, which come from either the surrounding air or the exhaust itself, and freezes, leaving behind a visible trail. The exhaust can also trigger the formation of cirrus by providing ice nuclei when there is an insufficient naturally-occurring supply in the atmosphere. One of the environmental impacts of aviation is that persistent contrails can form into large mats of cirrus, and increased air traffic has been implicated as one possible cause of the increasing frequency and amount of cirrus in Earth's atmosphere.", 'Because the aviation industry is especially sensitive to the weather, accurate weather forecasting is essential. Fog or exceptionally low ceilings can prevent many aircraft from landing and taking off. Turbulence and icing are also significant in-flight hazards. Thunderstorms are a problem for all aircraft because of severe turbulence due to their updrafts and outflow boundaries, icing due to the heavy precipitation, as well as large hail, strong winds, and lightning, all of which can cause severe damage to an aircraft in flight. Volcanic ash is also a significant problem for aviation, as aircraft can lose engine power within ash clouds. On a day-to-day basis airliners are routed to take advantage of the jet stream tailwind to improve fuel efficiency. Aircrews are briefed prior to takeoff on the conditions to expect en route and at their destination. Additionally, airports often change which runway is being used to take advantage of a headwind. This reduces the distance required for takeoff, and eliminates potential crosswinds.'], 'titles': ['', '']}}>]
Arguments
This generator uses the GPT-3 models hosted by Open AI to generate the answers. You need an API key from an active Open AI account to use these models.
Argument | Type | Possible Values | Description |
---|---|---|---|
api_key | String | Your API key from an active Open AI account. Mandatory. | |
model | String | Model name. Default: text-davinci-003 | The name of the Open AI model you want to use. Mandatory. |
max_tokens | Integer | Default: 50 | The maximum number of tokens the generated answer can have. Setting a number higher than the default allows for longer answers without exceeding the maximum prompt length of the Open AI model. Setting a number lower than the default allows for longer prompts with more documents passed as context, but the generated answer might be cut once it reaches max_tokens .Mandatory. |
top_k | Integer | Default: 5 | The number of generated answers. Mandatory. |
temperature | Float | Default: 0.2 | The sampling temperature you want to use. Higher values mean the model will take more risks. Value 0 works better for scenarios with a well-defined answer.Mandatory. |
presence_penalty | Float | A number between -2.0 and 2.0 Default: 0.1 | Positive values penalize new tokens based on whether they have already appeared in the text. This increases the model's likelihood of talking about new topics. For more information about frequency and presence penalties, see parameter details in OpenAI. Mandatory. |
frequency_penalty | Float | A number between -2.0 and 2.0 Default: 0.1 | Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood of repeating the same line verbatim. See more information about frequency and presence penalties. Mandatory. |
examples_context | String | A text snippet containing the contextual information used to generate the answers for the examples you provide. If not supplied, the default from OpenAI API docs is used: "In 2017, U.S. life expectancy was 78.6 years." Optional. | |
examples | A list of strings | List of (question, answer) pairs that help steer the model towards the tone and answer format you'd like. We recommend adding 2 to 3 examples. If not supplied, the default from OpenAI API docs is used: [Your API key from an active Open AI account. \nMandatory.", "1-0": "`] Optional. | |
stop_words | A list of strings | Up to four sequences where the API stops generating further tokens. The returned text does not contain the stop sequence. If you don't provide any stop words, the default value from OpenAI API docs is used: r API key from an activ. Optional. | |
progress_bar | Boolean | True/False Default: True | Shows the progress bar indicating the progress of answer generation. Mandatory. |
prompt_template | PromptTemplate | A PromptTemplate that tells the model how to generate answers given a context and query supplied at runtime. The context is automatically constructed at runtime from a list of provided documents. Use example_context and a list of examples to steer the model towards the tone and answer format you want. If not supplied, the default prompt template is:PromptTemplate( name="question-answering-with-examples", prompt_text="Please answer the question according to the above context." "\\n===\\nContext: $examples_context\\n===\\n$examples\\n\\n" "===\\nContext: $context\\n===\\n$query", prompt_params=["examples_context", "examples", "context", "query"], ) Optional. | |
context_join_str | String | The separation string used when joining the input documents to create the context used by the PromptTemplate. |
Updated about 1 year ago