AnswerGenerator
The AnswerGenerator generates a novel text as an answer to your query. It does so based on the documents you feed to it.
While extractive question answering highlights a span of text as an answer, AnswerGenerator generates a completely new text. It composes this text based on the knowledge it gained during the pretraining and the documents it got from the retriever.
Basic Information
- Pipeline type: Query pipeline.
- Position in a pipeline: After the retriever. You can use it as a substitute to the reader.
- Input: Query and Documents
- Output: Answers
- Available Classes: OpenAIAnswerGenerator, Seq2SeqGenerator
Usage
Initializing the Node
To initialize OpenAIAnswerGenerator
, run:
from haystack.nodes import OpenAIAnswerGenerator
generator = OpenAIAnswerGenerator(api_key=MY_API_KEY)
# Here's how you configure the node in YAML:
components:
- name: AnswerGenerator
type: OpenAIAnswerGenerator
params:
api_key: my_api_key
To initialize Seq2SeqGenerator
, run:
from haystack.nodes import Seq2SeqGenerator
generator = Seq2SeqGenerator(model_name_or_path="vblagoje/bart_lfqa")
# Here how you configure it in YAML:
components:
- name: AnswerGenerator
type: Seq2SeqGenerator
params:
model_name_or_path: your_locally_hosted_model
In a Pipeline
This is an example of how you can use OpenAIAnswerGenerator in a pipeline:
from haystack.pipelines import Pipeline
from haystack.nodes import OpenAIAnswerGenerator
from haystack.schema import Document
# These docs could also come from a retriever
# Here we explicitly specify them to avoid the setup steps for Retriever and DocumentStore
doc_1 = "Contrails are a manmade type of cirrus cloud formed when water vapor from the exhaust of a jet engine condenses on particles, which come from either the surrounding air or the exhaust itself, and freezes, leaving behind a visible trail. The exhaust can also trigger the formation of cirrus by providing ice nuclei when there is an insufficient naturally-occurring supply in the atmosphere. One of the environmental impacts of aviation is that persistent contrails can form into large mats of cirrus, and increased air traffic has been implicated as one possible cause of the increasing frequency and amount of cirrus in Earth's atmosphere."
doc_2 = "Because the aviation industry is especially sensitive to the weather, accurate weather forecasting is essential. Fog or exceptionally low ceilings can prevent many aircraft from landing and taking off. Turbulence and icing are also significant in-flight hazards. Thunderstorms are a problem for all aircraft because of severe turbulence due to their updrafts and outflow boundaries, icing due to the heavy precipitation, as well as large hail, strong winds, and lightning, all of which can cause severe damage to an aircraft in flight. Volcanic ash is also a significant problem for aviation, as aircraft can lose engine power within ash clouds. On a day-to-day basis airliners are routed to take advantage of the jet stream tailwind to improve fuel efficiency. Aircrews are briefed prior to takeoff on the conditions to expect en route and at their destination. Additionally, airports often change which runway is being used to take advantage of a headwind. This reduces the distance required for takeoff, and eliminates potential crosswinds."
# Let's initiate the OpenAIAnswerGenerator
node = OpenAIAnswerGenerator(
api_key=api_key,
model="text-davinci-003",
max_tokens=50,
presence_penalty=0.1,
frequency_penalty=0.1,
top_k=3,
temperature=0.9
)
# Let's create a pipeline with OpenAIAnswerGenerator
pipe = Pipeline()
pipe.add_node(component=node, name="prompt_node", inputs=["Query"])
output = pipe.run(query="Why do airplanes leave contrails in the sky?", documents=[Document(doc_1), Document(doc_2)])
output["answers"]
# Printed results
[<Answer {'answer': ' Contrails are created when water vapor from the exhaust of a jet engine condenses on particles, which come from either the surrounding air or the exhaust itself, and freezes, leaving behind a visible trail.', 'type': 'generative', 'score': None, 'context': None, 'offsets_in_document': None, 'offsets_in_context': None, 'document_id': None, 'meta': {'doc_ids': ['6a371f0bbb37c291befaaaf4704dc694', '2a2f7c49e1bec7864dd4bb447d5d0bfa'], 'doc_scores': [None, None], 'content': ["Contrails are a manmade type of cirrus cloud formed when water vapor from the exhaust of a jet engine condenses on particles, which come from either the surrounding air or the exhaust itself, and freezes, leaving behind a visible trail. The exhaust can also trigger the formation of cirrus by providing ice nuclei when there is an insufficient naturally-occurring supply in the atmosphere. One of the environmental impacts of aviation is that persistent contrails can form into large mats of cirrus, and increased air traffic has been implicated as one possible cause of the increasing frequency and amount of cirrus in Earth's atmosphere.", 'Because the aviation industry is especially sensitive to the weather, accurate weather forecasting is essential. Fog or exceptionally low ceilings can prevent many aircraft from landing and taking off. Turbulence and icing are also significant in-flight hazards. Thunderstorms are a problem for all aircraft because of severe turbulence due to their updrafts and outflow boundaries, icing due to the heavy precipitation, as well as large hail, strong winds, and lightning, all of which can cause severe damage to an aircraft in flight. Volcanic ash is also a significant problem for aviation, as aircraft can lose engine power within ash clouds. On a day-to-day basis airliners are routed to take advantage of the jet stream tailwind to improve fuel efficiency. Aircrews are briefed prior to takeoff on the conditions to expect en route and at their destination. Additionally, airports often change which runway is being used to take advantage of a headwind. This reduces the distance required for takeoff, and eliminates potential crosswinds.'], 'titles': ['', '']}}>,
<Answer {'answer': ' Airplanes leave contrails in the sky because water vapor from the exhaust of a jet engine condenses on particles, which come from either the surrounding air or the exhaust itself, and freezes, leaving behind a visible trail.', 'type': 'generative', 'score': None, 'context': None, 'offsets_in_document': None, 'offsets_in_context': None, 'document_id': None, 'meta': {'doc_ids': ['6a371f0bbb37c291befaaaf4704dc694', '2a2f7c49e1bec7864dd4bb447d5d0bfa'], 'doc_scores': [None, None], 'content': ["Contrails are a manmade type of cirrus cloud formed when water vapor from the exhaust of a jet engine condenses on particles, which come from either the surrounding air or the exhaust itself, and freezes, leaving behind a visible trail. The exhaust can also trigger the formation of cirrus by providing ice nuclei when there is an insufficient naturally-occurring supply in the atmosphere. One of the environmental impacts of aviation is that persistent contrails can form into large mats of cirrus, and increased air traffic has been implicated as one possible cause of the increasing frequency and amount of cirrus in Earth's atmosphere.", 'Because the aviation industry is especially sensitive to the weather, accurate weather forecasting is essential. Fog or exceptionally low ceilings can prevent many aircraft from landing and taking off. Turbulence and icing are also significant in-flight hazards. Thunderstorms are a problem for all aircraft because of severe turbulence due to their updrafts and outflow boundaries, icing due to the heavy precipitation, as well as large hail, strong winds, and lightning, all of which can cause severe damage to an aircraft in flight. Volcanic ash is also a significant problem for aviation, as aircraft can lose engine power within ash clouds. On a day-to-day basis airliners are routed to take advantage of the jet stream tailwind to improve fuel efficiency. Aircrews are briefed prior to takeoff on the conditions to expect en route and at their destination. Additionally, airports often change which runway is being used to take advantage of a headwind. This reduces the distance required for takeoff, and eliminates potential crosswinds.'], 'titles': ['', '']}}>,
<Answer {'answer': ' Contrails are formed when water vapor from the exhaust of a jet engine condenses on particles, which come from either the surrounding air or the exhaust itself, and freezes, leaving behind a visible trail.', 'type': 'generative', 'score': None, 'context': None, 'offsets_in_document': None, 'offsets_in_context': None, 'document_id': None, 'meta': {'doc_ids': ['6a371f0bbb37c291befaaaf4704dc694', '2a2f7c49e1bec7864dd4bb447d5d0bfa'], 'doc_scores': [None, None], 'content': ["Contrails are a manmade type of cirrus cloud formed when water vapor from the exhaust of a jet engine condenses on particles, which come from either the surrounding air or the exhaust itself, and freezes, leaving behind a visible trail. The exhaust can also trigger the formation of cirrus by providing ice nuclei when there is an insufficient naturally-occurring supply in the atmosphere. One of the environmental impacts of aviation is that persistent contrails can form into large mats of cirrus, and increased air traffic has been implicated as one possible cause of the increasing frequency and amount of cirrus in Earth's atmosphere.", 'Because the aviation industry is especially sensitive to the weather, accurate weather forecasting is essential. Fog or exceptionally low ceilings can prevent many aircraft from landing and taking off. Turbulence and icing are also significant in-flight hazards. Thunderstorms are a problem for all aircraft because of severe turbulence due to their updrafts and outflow boundaries, icing due to the heavy precipitation, as well as large hail, strong winds, and lightning, all of which can cause severe damage to an aircraft in flight. Volcanic ash is also a significant problem for aviation, as aircraft can lose engine power within ash clouds. On a day-to-day basis airliners are routed to take advantage of the jet stream tailwind to improve fuel efficiency. Aircrews are briefed prior to takeoff on the conditions to expect en route and at their destination. Additionally, airports often change which runway is being used to take advantage of a headwind. This reduces the distance required for takeoff, and eliminates potential crosswinds.'], 'titles': ['', '']}}>]
# This example just contains the query pipeline part.
# Normally, you'd also need to specify the indexing pipeline and its components here
components:
# In YAML, you must set up the DocumentStore and a Retriever to fetch the documents
- name: DocumentStore
type: DeepsetCloudDocumentStore
- name: Retriever # Selects the most relevant documents from the document store so that the OpenAI model can base it's generation on it.
type: EmbeddingRetriever # Uses a Transformer model to encode the document and the query
params:
document_store: DocumentStore
embedding_model: sentence-transformers/multi-qa-mpnet-base-dot-v1 # Model optimized for semantic search
model_format: sentence_transformers
top_k: 3 # The number of documents to return
- name: AnswerGenerator # Generates candidate answers based on the documents it gets from the retriever
type: OpenAIAnswerGenerator
params:
model: text-davinci-003
api_key: your_openai_api_key # You can also set the api key in the Connections tab, then you don't need to add it here
max_tokens: 50 # The maximum number of tokens allowed for each generated Answer.
temperature: 0.9 # Determines the randomness of the model. Higher values mean the model will take more risk
presence_penalty: 0.1 # Positive values penalize new tokens based on whether they have already appeared in the text.
top_k: 3 # The number of results to return
# Here you define how the nodes are organized in the pipelines
# For each node, specify its input
pipelines:
- name: query
nodes:
- name: Retriever
inputs: [Query]
- name: AnswerGen
inputs: [Retriever]
- name: indexing
# Here comes the indexing pipeline
AnswerGenerator Types
OpenAIAnswerGenerator
This generator uses the GPT-3 models hosted by Open AI. You need an API key from an active Open AI account to use these models.
Arguments
Parameter | Type | Possible Values | Description |
---|---|---|---|
api_key | String | Your API key from an active Open AI account. Mandatory. | |
model | String | Model name. Default: text-davinci-003 | The name of the Open AI model you want to use. Mandatory. |
max_tokens | Integer | Default: 50 | The maximum number of tokens the generated answer can have. Setting a number higher than the default allows for longer answers without exceeding the maximum prompt length of the Open AI model. Setting a number lower than the default allows for longer prompts with more documents passed as context, but the generated answer might be cut once it reaches max_tokens .Mandatory. |
top_k | Integer | Default: 5 | The number of generated answers. Mandatory. |
temperature | Float | Default: 0.2 | The sampling temperature you want to use. Higher values mean the model will take more risks. Value 0 works better for scenarios with a well-defined answer.Mandatory. |
presence_penalty | Float | A number between -2.0 and 2.0 Default: 0.1 | Positive values penalize new tokens based on whether they have already appeared in the text. This increases the model's likelihood of talking about new topics. For more information about frequency and presence penalties, see parameter details in OpenAI. Mandatory. |
frequency_penalty | Float | A number between -2.0 and 2.0 Default: 0.1 | Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood of repeating the same line verbatim. See more information about frequency and presence penalties. Mandatory. |
examples_context | String | A text snippet containing the contextual information used to generate the answers for the examples you provide. If not supplied, the default from OpenAI API docs is used: "In 2017, U.S. life expectancy was 78.6 years." Optional. | |
examples | A list of strings | List of (question, answer) pairs that help steer the model towards the tone and answer format you'd like. We recommend adding 2 to 3 examples. If not supplied, the default from OpenAI API docs is used: [["Q: What is human life expectancy in the United States?", "A: 78 years."]] Optional. | |
stop_words | A list of strings | Up to four sequences where the API stops generating further tokens. The returned text does not contain the stop sequence. If you don't provide any stop words, the default value from OpenAI API docs is used: ["\n", "<|endoftext|>"]. Optional. | |
progress_bar | Boolean | True/False Default: True | Shows the progress bar indicating the progress of answer generation. Mandatory. |
prompt_template | PromptTemplate | A PromptTemplate that tells the model how to generate answers given a context and query supplied at runtime. The context is automatically constructed at runtime from a list of provided documents. Use example_context and a list of examples to steer the model towards the tone and answer format you want. If not supplied, the default prompt template is:PromptTemplate( name="question-answering-with-examples", prompt_text="Please answer the question according to the above context." "\\n===\\nContext: $examples_context\\n===\\n$examples\\n\\n" "===\\nContext: $context\\n===\\n$query", prompt_params=["examples_context", "examples", "context", "query"], ) Optional. | |
context_join_str | String | The separation string used when joining the input documents to create the context used by the PromptTemplate. |
Seq2SeqGenerator
This is a generic sequence-to-sequence generator that uses Hugging Face's transformers. It supports all Text2Text models from the Hugging Face hub. If a model has the prefix AutoModelForSeq2SeqLM
in its details on the model card, it means you can use it with this generator.
Because language models prepare model input in their specific encoding, you must specify an accompanying model input converter for the model you specify for Set2SeqGenerator. This input converter takes documents and query as input and formats them into a single sequence that the generator can use.
By default, we provide model input converters for bart_lfqa and bart_eli5 models. But you must always make sure an appropriate input converter is already registered or specified for this generator.
Arguments
Parameter | Type | Possible Values | Description |
---|---|---|---|
model_name_or_path | String | Model name or path | The Hugging Face model name or path to the model you want to use with this generator. Mandatory. |
input_converter | Callable | A callable to prepare model input for the language model you specified for this generator. in the model_path_or_name parameter. The required call() method signature for this callable is:call(tokenizer:PreTrainedTokenizer, query:str, documents:List[Document],top_k:Optional[int] =None) -> BatchEncoding .Optional. | |
top_k | Integer | Default: 1 | The number of independently generated answers. Mandatory. |
max_length | Integer | Default: 200 | The maximum length of the generated answer. Mandatory. |
min_length | Integer | Default: 2 | The minimum length of the generated answer. Mandatory. |
num_beams | Integer | Default: 8 | The number of beams for beam search. 1 means no beam search. Mandatory. |
use_gpu | Boolean | True/False Default: True | Indicates if you want to use GPU or CPU. If no GPU is available, the generator model uses CPU. Mandatory. |
progress_bar | Boolean | True/False Default: True | Shows a progress bar during generation. Mandatory. |
use_auth_token | Union of string and Boolean | Default: None | The API token you want to use to download a private model from Hugging Face. If you set it to True , it uses the token generated when running huggingface-cli login (stored in ~/ .huggingface).For more information, see Hugging Face documentation. Optional. |
devices | List | Default: None | A list of torch devices (for example, cuda, cpu, mps) you want to limit inference to. Type a list containing torch device objects or strings, for example [torch.device('cuda:0'), "mps", "cuda:1"] . When specifying use_gpu=False , this parameter is not used and a single cpu device is used for inference. |
Updated 15 days ago