Retrieval Augmented Generation (RAG) Question Answering Answering Pipelines
These pipelines use a large language model to generate answers based on the model's general knowledge and the documents you feed to it.
API Key
To reuse these pipelines, first make sure you have the API key needed to access the models. You'll need an API key for OpenAI, Cohere, and Hugging Face models. You can add them in the Connections tab in deepset Cloud.
How RAG Works
RAG pipelines generate answers based on your documents. The Retriever fetches the documents from the Document Store and passes them to the model in the prompt. RAG pipelines begin with a document search step and then combine it with the PromptNode that uses an LLM to generate answers.
You can also check Document Search Pipelines for examples.
RAG QA with GPT-3, Hybrid Retrieval, and a Custom Prompt
This pipeline uses Open AI's gpt-3.5-turbo model and a combination of vector-based and keyword-based retrievers. You need an API key from an active Open AI account to use this model.
It's a RAG pipeline, which means it uses the files you uploaded to deepset Cloud (or the files from your VPC connected to deepset Cloud) rather than the model's knowledge of the world to generate the answers. It passes the files in the prompt using the Document
variable. This pipeline has both, the vector-based and the keyword-based retriever and a JoinDocuments node that joins the results retrieved by both retrievers and passes them on to the model in the prompt.
This pipeline is available as a ready-made template in deepset Cloud. You can choose it from the list of templates that show up when you choose to create a pipeline.
It uses a custom prompt passed in the prompt
parameter of the PromptTemplate
component. You can modify the prompt directly in the prompt
parameter.
# This section defines nodes that you want to use in your pipelines. Each node must have a name and a type. You can also set the node's parameters here.
# The name is up to you, you can give your component a friendly name. You then use components' names when specifying their order in the pipeline.
# Type is the class name of the component.
components:
- name: DocumentStore
type: DeepsetCloudDocumentStore
params:
embedding_dim: 768
similarity: cosine
- name: BM25Retriever # The keyword-based retriever
type: BM25Retriever
params:
document_store: DocumentStore
top_k: 10 # The number of results to return
- name: EmbeddingRetriever # Selects the most relevant documents from the document store
type: EmbeddingRetriever # Uses a Transformer model to encode the document and the query
params:
document_store: DocumentStore
embedding_model: intfloat/e5-base-v2 # Model optimized for semantic search. It has been trained on 215M (question, answer) pairs from diverse sources.
model_format: sentence_transformers
top_k: 10 # The number of results to return
- name: JoinResults # Joins the results from both retrievers
type: JoinDocuments
params:
join_mode: concatenate # Combines documents from multiple retrievers
- name: Reranker # Uses a cross-encoder model to rerank the documents returned by the two retrievers
type: SentenceTransformersRanker
params:
model_name_or_path: intfloat/simlm-msmarco-reranker # Fast model optimized for reranking
top_k: 4 # The number of results to return
batch_size: 20 # Try to keep this number equal or larger to the sum of the top_k of the two retrievers so all docs are processed at once
model_kwargs: # Additional keyword arguments for the model
torch_dtype: torch.float16
- name: qa_template
type: PromptTemplate
params:
output_parser:
type: AnswerParser
prompt: >
You are a technical expert.
{new_line}You answer questions truthfully based on provided documents.
{new_line}For each document check whether it is related to the question.
{new_line}Only use documents that are related to the question to answer it.
{new_line}Ignore documents that are not related to the question.
{new_line}If the answer exists in several documents, summarize them.
{new_line}Only answer based on the documents provided. Don't make things up.
{new_line}Always use references in the form [NUMBER OF DOCUMENT] when using information from a document. e.g. [3], for Document[3].
{new_line}The reference must only refer to the number that comes in square brackets after passage.
{new_line}Otherwise, do not use brackets in your answer and reference ONLY the number of the passage without mentioning the word passage.
{new_line}If the documents can't answer the question or you are unsure say: 'The answer can't be found in the text'.
{new_line}These are the documents:
{join(documents, delimiter=new_line, pattern=new_line+'Document[$idx]:'+new_line+'$content')}
{new_line}Question: {query}
{new_line}Answer:
- name: PromptNode
type: PromptNode
params:
default_prompt_template: qa_template
max_length: 400 # The maximum number of tokens the generated answer can have
model_kwargs: # Specifies additional model settings
temperature: 0 # Lower temperature works best for fact-based qa
model_name_or_path: gpt-3.5-turbo
- name: FileTypeClassifier # Routes files based on their extension to appropriate converters, by default txt, pdf, md, docx, html
type: FileTypeClassifier
- name: TextConverter # Converts files into documents
type: TextConverter
- name: PDFConverter # Converts PDFs into documents
type: PDFToTextConverter
- name: Preprocessor # Splits documents into smaller ones and cleans them up
type: PreProcessor
params:
# With a vector-based retriever, it's good to split your documents into smaller ones
split_by: word # The unit by which you want to split the documents
split_length: 250 # The max number of words in a document
split_overlap: 20 # Enables the sliding window approach
language: en
split_respect_sentence_boundary: True # Retains complete sentences in split documents
# Here you define how the nodes are organized in the pipelines
# For each node, specify its input
pipelines:
- name: query
nodes:
- name: BM25Retriever
inputs: [Query]
- name: EmbeddingRetriever
inputs: [Query]
- name: JoinResults
inputs: [BM25Retriever, EmbeddingRetriever]
- name: Reranker
inputs: [JoinResults]
- name: PromptNode
inputs: [Reranker]
- name: indexing
nodes:
# Depending on the file type, we use a Text or PDF converter
- name: FileTypeClassifier
inputs: [File]
- name: TextConverter
inputs: [FileTypeClassifier.output_1] # Ensures that this converter receives txt files
- name: PDFConverter
inputs: [FileTypeClassifier.output_2] # Ensures that this converter receives PDFs
- name: Preprocessor
inputs: [TextConverter, PDFConverter]
- name: EmbeddingRetriever
inputs: [Preprocessor]
- name: DocumentStore
inputs: [EmbeddingRetriever]
Baseline RAG QA with Default Prompt and an Open Source Model
This pipeline uses the files you uploaded to deepset Cloud (or the files from your VPC connected to deepset Cloud) to generate answers. This is done by adding the Retriever node, which fetches the documents from the DeepsetCloudDocumentStore. The documents are then passed on to the model in the prompt.
components:
- name: DocumentStore
type: DeepsetCloudDocumentStore # The only supported document store in deepset Cloud
- name: Retriever # Selects the most relevant documents from the document store so that the LLM can base its generation on it.
type: EmbeddingRetriever # Uses a Transformer model to encode the document and the query
params:
document_store: DocumentStore
embedding_model: intfloat/e5-base-v2 # Model optimized for semantic search
model_format: sentence_transformers
top_k: 1 # The number of documents to return
- name: PromptNode # The component that generates the answer based on the documents it gets from the retriever
type: PromptNode
params:
default_prompt_template: deepset/question-answering # A ready-made prompt that passes documents to the model as context
model_name_or_path: google/flan-t5-large # A free large language model for PromptNode. For production scenarios, we recommend a paid model.
top_k: 3 # The number of answers to generate, you can change this value.
- name: FileTypeClassifier # Routes files based on their extension to appropriate converters, by default txt, pdf, md, docx, html
type: FileTypeClassifier
- name: TextConverter # Converts files into documents
type: TextConverter
- name: PDFConverter # Converts PDFs into documents
type: PDFToTextConverter
- name: Preprocessor # Splits documents into smaller ones and cleans them up
type: PreProcessor
params:
# With a vector-based retriever, it's good to split your documents into smaller ones
split_by: word # The unit by which you want to split the documents
split_length: 250 # The max number of words in a document
split_overlap: 30 # Enables the sliding window approach
split_respect_sentence_boundary: True # Retains complete sentences in split documents
language: en # Used by NLTK to best detect the sentence boundaries for that language
# Here you define how the nodes are organized in the pipelines
# For each node, specify its input
pipelines:
- name: query
nodes:
- name: Retriever
inputs: [Query]
- name: PromptNode
inputs: [Retriever]
- name: indexing
nodes:
# Depending on the file type, we use a Text or PDF converter
- name: FileTypeClassifier
inputs: [File]
- name: TextConverter
inputs: [FileTypeClassifier.output_1] # Ensures this converter receives TXT files
- name: PDFConverter
inputs: [FileTypeClassifier.output_2] # Ensures this converter receives PDFs
- name: Preprocessor
inputs: [TextConverter, PDFConverter]
- name: Retriever
inputs: [Preprocessor]
- name: DocumentStore
inputs: [Retriever]
This pipeline template is a good starting point. For production systems, we recommend changing the free FLAN T5 model to a better-performing model, such as OpenAI's gpt-3.5 turbo.
You can modify the PromptNode to use a custom prompt or another ready-made prompt template. For more information, see PromptNode. You may also have a look at Prompt Engineering Guidelines for guidance on how to create prompts and then check Experimenting with Prompts to learn how to use Prompt Studio to work on your prompts.
Pipeline That Shows Answer References
This pipeline uses the ReferencePredictor node to attach references to documents the LLM's answer is based on. You can open the reference right from the search page and check if the answer is there and the model didn't hallucinate:
components:
- name: DocumentStore
type: DeepsetCloudDocumentStore
- name: EmbeddingRetriever # Selects the most relevant documents from the document store
type: EmbeddingRetriever # Uses a Transformer model to encode the document and the query
params:
document_store: DocumentStore
embedding_model: intfloat/e5-base-v2 # Model optimized for semantic search. It has been trained on 215M (question, answer) pairs from diverse sources.
model_format: sentence_transformers
top_k: 10 # The number of results to return
- name: qa_template
type: PromptTemplate
params:
prompt: deepset/question-answering #This uses one of the ready-made templates from Prompt Studio
output_parser:
type: AnswerParser # We need the output of PromptNode to be an answer object as ReferencePredictor accepts answers as input
- name: PromptNode
type: PromptNode
params:
default_prompt_template: qa_template
max_length: 400 # The maximum number of tokens the generated answer can have
model_kwargs: # Specifies additional model settings
temperature: 0 # Lower temperature works best for fact-based qa
model_name_or_path: gpt-3.5-turbo
- name: ReferencePredictor
type: ReferencePredictor
params:
language: en
use_split_rules: True
extend_abbreviations: True
- name: FileTypeClassifier # Routes files based on their extension to appropriate converters, by default txt, pdf, md, docx, html
type: FileTypeClassifier
- name: TextConverter # Converts files into documents
type: TextConverter
- name: PDFConverter # Converts PDFs into documents
type: PDFToTextConverter
- name: Preprocessor # Splits documents into smaller ones and cleans them up
type: PreProcessor
params:
# With a vector-based retriever, it's good to split your documents into smaller ones
split_by: word # The unit by which you want to split the documents
split_length: 250 # The max number of words in a document
split_overlap: 20 # Enables the sliding window approach
language: en
split_respect_sentence_boundary: True # Retains complete sentences in split documents
# Here you define how the nodes are organized in the pipelines
# For each node, specify its input
pipelines:
- name: query
nodes:
- name: EmbeddingRetriever
inputs: [Query]
- name: PromptNode
inputs: [EmbeddingRetriever]
- name: ReferencePredictor
inputs: [PromptNode]
- name: indexing
nodes:
# Depending on the file type, we use a Text or PDF converter
- name: FileTypeClassifier
inputs: [File]
- name: TextConverter
inputs: [FileTypeClassifier.output_1] # Ensures that this converter receives txt files
- name: PDFConverter
inputs: [FileTypeClassifier.output_2] # Ensures that this converter receives PDFs
- name: Preprocessor
inputs: [TextConverter, PDFConverter]
- name: EmbeddingRetriever
inputs: [Preprocessor]
- name: DocumentStore
inputs: [EmbeddingRetriever]
Pipeline with a SpellChecker PromptNode
This pipeline chains two PromptNodes. The first one acts as a spell checker for the query. It corrects the query, if necessary, and then sends it to the next PromptNode, whose task is to generate an answer to the query.
components:
- name: DocumentStore
type: DeepsetCloudDocumentStore # The only supported document store in deepset Cloud
params:
embedding_dim: 768
similarity: cosine
- name: query_spell_check
type: PromptTemplate
params:
prompt: >
You are a spelling correction system.
{new_line}You receive a question and correct it.
{new_line}Output only the corrected question
{new_line}Question: {query}
{new_line}Corrected Question:
- name: SpellCheckPromptNode
type: PromptNode
params:
default_prompt_template: query_spell_check
max_length: 650 # The maximum number of tokens the generated answer can have
model_kwargs: # Specifies additional model settings
temperature: 0 # Lower temperature works best for fact-based results
model_name_or_path: gpt-3.5-turbo
- name: ListToString # Converts the output from SpellCheckPromptNode into a single query string, which is the input type the retriever expects.
type: Shaper
params:
func: join_strings
inputs:
strings: results
outputs:
- query
- name: EmbeddingRetriever # Selects the most relevant documents from the document store
type: EmbeddingRetriever # Uses a Transformer model to encode the document and the query
params:
document_store: DocumentStore
embedding_model: intfloat/e5-base-v2 # Model optimized for semantic search. It has been trained on 215M (question, answer) pairs from diverse sources.
model_format: sentence_transformers
top_k: 20 # The number of results to return
- name: Reranker # Uses a cross-encoder model to rerank the documents returned by the two retrievers
type: SentenceTransformersRanker
params:
model_name_or_path: intfloat/simlm-msmarco-reranker # Fast model optimized for reranking
top_k: 4 # The number of results to return
batch_size: 20 # Try to keep this number equal or larger to the sum of the top_k of the two retrievers so all docs are processed at once
model_kwargs: # Additional keyword arguments for the model
torch_dtype: torch.float16
- name: qa_template
type: PromptTemplate
params:
output_parser:
type: AnswerParser
prompt: >
You are a technical expert.
{new_line}You answer questions truthfully based on provided documents.
{new_line}For each document check whether it is related to the question.
{new_line}Only use documents that are related to the question to answer it.
{new_line}Ignore documents that are not related to the question.
{new_line}If the answer exists in several documents, summarize them.
{new_line}Only answer based on the documents provided. Don't make things up.
{new_line}Always use references in the form [NUMBER OF DOCUMENT] when using information from a document. e.g. [3], for Document[3].
{new_line}The reference must only refer to the number that comes in square brackets after passage.
{new_line}Otherwise, do not use brackets in your answer and reference ONLY the number of the passage without mentioning the word passage.
{new_line}If the documents can't answer the question or you are unsure say: 'The answer can't be found in the text'.
{new_line}These are the documents:
{join(documents, delimiter=new_line, pattern=new_line+'Document[$idx]:'+new_line+'$content')}
{new_line}Question: {query}
{new_line}Answer:
- name: PromptNode
type: PromptNode
params:
default_prompt_template: qa_template
max_length: 400 # The maximum number of tokens the generated answer can have
model_kwargs: # Specifies additional model settings
temperature: 0 # Lower temperature works best for fact-based qa
model_name_or_path: gpt-3.5-turbo
- name: FileTypeClassifier # Routes files based on their extension to appropriate converters, by default txt, pdf, md, docx, html
type: FileTypeClassifier
- name: TextConverter # Converts files into documents
type: TextConverter
- name: PDFConverter # Converts PDFs into documents
type: PDFToTextConverter
- name: Preprocessor # Splits documents into smaller ones and cleans them up
type: PreProcessor
params:
# With a vector-based retriever, it's good to split your documents into smaller ones
split_by: word # The unit by which you want to split the documents
split_length: 250 # The max number of words in a document
split_overlap: 20 # Enables the sliding window approach
language: en
split_respect_sentence_boundary: True # Retains complete sentences in split documents
# Here you define how the nodes are organized in the pipelines
# For each node, specify its input
pipelines:
- name: query
nodes:
- name: SpellCheckPromptNode
inputs: [Query]
- name: ListToString
inputs: [SpellCheckPromptNode]
- name: EmbeddingRetriever
inputs: [ListToString]
- name: Reranker
inputs: [EmbeddingRetriever]
- name: PromptNode
inputs: [Reranker]
- name: indexing
nodes:
# Depending on the file type, we use a Text or PDF converter
- name: FileTypeClassifier
inputs: [File]
- name: TextConverter
inputs: [FileTypeClassifier.output_1] # Ensures that this converter receives txt files
- name: PDFConverter
inputs: [FileTypeClassifier.output_2] # Ensures that this converter receives PDFs
- name: Preprocessor
inputs: [TextConverter, PDFConverter]
- name: EmbeddingRetriever
inputs: [Preprocessor]
- name: DocumentStore
inputs: [EmbeddingRetriever]
Pipeline with the Llama 2 Model Hosted on AWS Bedrock
This pipeline uses both keyword-based and vector-based Retriever to fetch the documents from the document store and pass them on to the model in the prompt. The model is Llama 2, the 13b version hosted on AWS Bedrock. To use a model hosted on Bedrock, pass its model ID preceded by deepset-cloud
, like in line 72 of the YAML below. You can also swap the Llama model to the 70b version. For a list of models with their IDs, see AWS Bedrock Base Model IDs.
components:
- name: DocumentStore
type: DeepsetCloudDocumentStore
params:
embedding_dim: 768
similarity: cosine
- name: BM25Retriever # The keyword-based retriever
type: BM25Retriever
params:
document_store: DocumentStore
top_k: 10 # The number of results to return
- name: EmbeddingRetriever # Selects the most relevant documents from the document store
type: EmbeddingRetriever # Uses a Transformer model to encode the document and the query
params:
document_store: DocumentStore
embedding_model: intfloat/e5-base-v2 # Model optimized for semantic search. It has been trained on 215M (question, answer) pairs from diverse sources.
model_format: sentence_transformers
top_k: 10 # The number of results to return
- name: JoinResults # Joins the results from both retrievers
type: JoinDocuments
params:
join_mode: concatenate # Combines documents from multiple retrievers
- name: Reranker # Uses a cross-encoder model to rerank the documents returned by the two retrievers
type: CNSentenceTransformersRanker
params:
model_name_or_path: intfloat/simlm-msmarco-reranker # Fast model optimized for reranking
top_k: 4 # The number of results to return
batch_size: 20 # Try to keep this number equal or larger to the sum of the top_k of the two retrievers so all docs are processed at once
model_kwargs: # Additional keyword arguments for the model
torch_dtype: torch.float16
- name: qa_template
type: PromptTemplate
params:
output_parser:
type: AnswerParser
prompt: >-
<s>[INST] <<SYS>>
{new_line}You are a technical expert.
You answer questions truthfully based on provided documents.
For each document check whether it is related to the question.
Only use documents that are related to the question to answer it.
Ignore documents that are not related to the question.
If the answer exists in several documents, summarize them.
Only answer based on the documents provided. Don't make things up.
Always use references in the form [NUMBER OF DOCUMENT] when using information from a document. e.g. [3], for Document[3].
The reference must only refer to the number that comes in square brackets after passage.
Otherwise, do not use brackets in your answer and reference ONLY the number of the passage without mentioning the word passage.
If the documents can't answer the question or you are unsure say: 'The answer can't be found in the text'.
{new_line}<</SYS>>
{new_line}{new_line}These are the documents:
{join(documents, delimiter=new_line, pattern=new_line+'Document[$idx]:'+new_line+'$content')}
{new_line}Question: {query}
{new_line}Answer:
{new_line}[/INST]
- name: PromptNode
type: PromptNode
params:
default_prompt_template: qa_template
max_length: 400 # The maximum number of tokens the generated answer can have
model_kwargs: # Specifies additional model settings
temperature: 0 # Lower temperature works best for fact-based qa
model_name_or_path: deepset-cloud-meta.llama2-13b-chat-v1
- name: FileTypeClassifier # Routes files based on their extension to appropriate converters, by default txt, pdf, md, docx, html
type: FileTypeClassifier
- name: TextConverter # Converts files into documents
type: TextConverter
- name: PDFConverter # Converts PDFs into documents
type: PDFToTextConverter
- name: Preprocessor # Splits documents into smaller ones and cleans them up
type: PreProcessor
params:
# With a vector-based retriever, it's good to split your documents into smaller ones
split_by: word # The unit by which you want to split the documents
split_length: 250 # The max number of words in a document
split_overlap: 20 # Enables the sliding window approach
language: en
split_respect_sentence_boundary: True # Retains complete sentences in split documents
# Here you define how the nodes are organized in the pipelines
# For each node, specify its input
pipelines:
- name: query
nodes:
- name: BM25Retriever
inputs: [Query]
- name: EmbeddingRetriever
inputs: [Query]
- name: JoinResults
inputs: [BM25Retriever, EmbeddingRetriever]
- name: Reranker
inputs: [JoinResults]
- name: PromptNode
inputs: [Reranker]
- name: indexing
nodes:
# Depending on the file type, we use a Text or PDF converter
- name: FileTypeClassifier
inputs: [File]
- name: TextConverter
inputs: [FileTypeClassifier.output_1] # Ensures that this converter receives txt files
- name: PDFConverter
inputs: [FileTypeClassifier.output_2] # Ensures that this converter receives PDFs
- name: Preprocessor
inputs: [TextConverter, PDFConverter]
- name: EmbeddingRetriever
inputs: [Preprocessor]
- name: DocumentStore
inputs: [EmbeddingRetriever]
Updated 5 months ago