Retrieval Augmented Generation (RAG) Question Answering Answering Pipelines

These pipelines use a large language model to generate answers based on the model's general knowledge and the documents you feed to it.

πŸ“˜

API Key

To reuse these pipelines, first make sure you have the API key needed to access the models. You'll need an API key for OpenAI, Cohere, and Hugging Face models. You can add them in the Connections tab in deepset Cloud.

How RAG Works

RAG pipelines generate answers based on your documents. The Retriever fetches the documents from the Document Store and passes them to the model in the prompt. RAG pipelines begin with a document search step and then combine it with the PromptNode that uses an LLM to generate answers.
You can also check Document Search Pipelines for examples.

RAG QA with GPT-3, Hybrid Retrieval, and a Custom Prompt

This pipeline uses Open AI's gpt-3.5-turbo model and a combination of vector-based and keyword-based retrievers. You need an API key from an active Open AI account to use this model.

It's a RAG pipeline, which means it uses the files you uploaded to deepset Cloud (or the files from your VPC connected to deepset Cloud) rather than the model's knowledge of the world to generate the answers. It passes the files in the prompt using the Document variable. This pipeline has both, the vector-based and the keyword-based retriever and a JoinDocuments node that joins the results retrieved by both retrievers and passes them on to the model in the prompt.

This pipeline is available as a ready-made template in deepset Cloud. You can choose it from the list of templates that show up when you choose to create a pipeline.

It uses a custom prompt passed in the prompt parameter of the PromptTemplate component. You can modify the prompt directly in the prompt parameter.


# This section defines nodes that you want to use in your pipelines. Each node must have a name and a type. You can also set the node's parameters here.
# The name is up to you, you can give your component a friendly name. You then use components' names when specifying their order in the pipeline.
# Type is the class name of the component. 
components:
  - name: DocumentStore
    type: DeepsetCloudDocumentStore
    params:
      embedding_dim: 768
      similarity: cosine
  - name: BM25Retriever # The keyword-based retriever
    type: BM25Retriever
    params:
      document_store: DocumentStore
      top_k: 10 # The number of results to return
  - name: EmbeddingRetriever # Selects the most relevant documents from the document store
    type: EmbeddingRetriever # Uses a Transformer model to encode the document and the query
    params:
      document_store: DocumentStore
      embedding_model: intfloat/e5-base-v2 # Model optimized for semantic search. It has been trained on 215M (question, answer) pairs from diverse sources.
      model_format: sentence_transformers
      top_k: 10 # The number of results to return
  - name: JoinResults # Joins the results from both retrievers
    type: JoinDocuments
    params:
      join_mode: concatenate # Combines documents from multiple retrievers
  - name: Reranker # Uses a cross-encoder model to rerank the documents returned by the two retrievers
    type: SentenceTransformersRanker
    params:
      model_name_or_path: intfloat/simlm-msmarco-reranker # Fast model optimized for reranking
      top_k: 4 # The number of results to return
      batch_size: 20  # Try to keep this number equal or larger to the sum of the top_k of the two retrievers so all docs are processed at once
      model_kwargs:  # Additional keyword arguments for the model
        torch_dtype: torch.float16
  - name: qa_template
    type: PromptTemplate
    params:
      output_parser:
        type: AnswerParser
      prompt: >
        You are a technical expert.
        {new_line}You answer questions truthfully based on provided documents.
        {new_line}For each document check whether it is related to the question.
        {new_line}Only use documents that are related to the question to answer it.
        {new_line}Ignore documents that are not related to the question.
        {new_line}If the answer exists in several documents, summarize them.
        {new_line}Only answer based on the documents provided. Don't make things up.
        {new_line}Always use references in the form [NUMBER OF DOCUMENT] when using information from a document. e.g. [3], for Document[3].
        {new_line}The reference must only refer to the number that comes in square brackets after passage.
        {new_line}Otherwise, do not use brackets in your answer and reference ONLY the number of the passage without mentioning the word passage.
        {new_line}If the documents can't answer the question or you are unsure say: 'The answer can't be found in the text'.
        {new_line}These are the documents:
        {join(documents, delimiter=new_line, pattern=new_line+'Document[$idx]:'+new_line+'$content')}
        {new_line}Question: {query}
        {new_line}Answer:
  - name: PromptNode
    type: PromptNode
    params:
      default_prompt_template: qa_template
      max_length: 400 # The maximum number of tokens the generated answer can have
      model_kwargs: # Specifies additional model settings
        temperature: 0 # Lower temperature works best for fact-based qa
      model_name_or_path: gpt-3.5-turbo
  - name: FileTypeClassifier # Routes files based on their extension to appropriate converters, by default txt, pdf, md, docx, html
    type: FileTypeClassifier
  - name: TextConverter # Converts files into documents
    type: TextConverter
  - name: PDFConverter # Converts PDFs into documents
    type: PDFToTextConverter
  - name: Preprocessor # Splits documents into smaller ones and cleans them up
    type: PreProcessor
    params:
      # With a vector-based retriever, it's good to split your documents into smaller ones
      split_by: word # The unit by which you want to split the documents
      split_length: 250 # The max number of words in a document
      split_overlap: 20 # Enables the sliding window approach
      language: en
      split_respect_sentence_boundary: True # Retains complete sentences in split documents

# Here you define how the nodes are organized in the pipelines
# For each node, specify its input
pipelines:
  - name: query
    nodes:
      - name: BM25Retriever
        inputs: [Query]
      - name: EmbeddingRetriever
        inputs: [Query]
      - name: JoinResults
        inputs: [BM25Retriever, EmbeddingRetriever]
      - name: Reranker
        inputs: [JoinResults]
      - name: PromptNode
        inputs: [Reranker]
  - name: indexing
    nodes:
    # Depending on the file type, we use a Text or PDF converter
      - name: FileTypeClassifier
        inputs: [File]
      - name: TextConverter
        inputs: [FileTypeClassifier.output_1] # Ensures that this converter receives txt files
      - name: PDFConverter
        inputs: [FileTypeClassifier.output_2] # Ensures that this converter receives PDFs
      - name: Preprocessor
        inputs: [TextConverter, PDFConverter]
      - name: EmbeddingRetriever
        inputs: [Preprocessor]
      - name: DocumentStore
        inputs: [EmbeddingRetriever]

Baseline RAG QA with Default Prompt and an Open Source Model

This pipeline uses the files you uploaded to deepset Cloud (or the files from your VPC connected to deepset Cloud) to generate answers. This is done by adding the Retriever node, which fetches the documents from the DeepsetCloudDocumentStore. The documents are then passed on to the model in the prompt.


components:
  - name: DocumentStore
    type: DeepsetCloudDocumentStore # The only supported document store in deepset Cloud
  - name: Retriever # Selects the most relevant documents from the document store so that the LLM can base its generation on it. 
    type: EmbeddingRetriever # Uses a Transformer model to encode the document and the query
    params:
      document_store: DocumentStore
      embedding_model: intfloat/e5-base-v2 # Model optimized for semantic search 
      model_format: sentence_transformers
      top_k: 1 # The number of documents to return
  - name: PromptNode # The component that generates the answer based on the documents it gets from the retriever 
    type: PromptNode
    params:
      default_prompt_template: deepset/question-answering # A ready-made prompt that passes documents to the model as context 
      model_name_or_path: google/flan-t5-large # A free large language model for PromptNode. For production scenarios, we recommend a paid model.
      top_k: 3 # The number of answers to generate, you can change this value.
  - name: FileTypeClassifier # Routes files based on their extension to appropriate converters, by default txt, pdf, md, docx, html
    type: FileTypeClassifier
  - name: TextConverter # Converts files into documents
    type: TextConverter
  - name: PDFConverter # Converts PDFs into documents
    type: PDFToTextConverter
  - name: Preprocessor # Splits documents into smaller ones and cleans them up
    type: PreProcessor
    params:
      # With a vector-based retriever, it's good to split your documents into smaller ones
      split_by: word # The unit by which you want to split the documents
      split_length: 250 # The max number of words in a document
      split_overlap: 30 # Enables the sliding window approach
      split_respect_sentence_boundary: True # Retains complete sentences in split documents
      language: en # Used by NLTK to best detect the sentence boundaries for that language

# Here you define how the nodes are organized in the pipelines
# For each node, specify its input
pipelines:
  - name: query
    nodes:
      - name: Retriever
        inputs: [Query]
      - name: PromptNode
        inputs: [Retriever]
  - name: indexing
    nodes:
    # Depending on the file type, we use a Text or PDF converter
      - name: FileTypeClassifier
        inputs: [File]
      - name: TextConverter
        inputs: [FileTypeClassifier.output_1] # Ensures this converter receives TXT files
      - name: PDFConverter
        inputs: [FileTypeClassifier.output_2] # Ensures this converter receives PDFs
      - name: Preprocessor
        inputs: [TextConverter, PDFConverter]
      - name: Retriever
        inputs: [Preprocessor]
      - name: DocumentStore
        inputs: [Retriever]

This pipeline template is a good starting point. For production systems, we recommend changing the free FLAN T5 model to a better-performing model, such as OpenAI's gpt-3.5 turbo.

You can modify the PromptNode to use a custom prompt or another ready-made prompt template. For more information, see PromptNode. You may also have a look at Prompt Engineering Guidelines for guidance on how to create prompts and then check Experimenting with Prompts to learn how to use Prompt Studio to work on your prompts.

Pipeline That Shows Answer References

This pipeline uses the ReferencePredictor node to attach references to documents the LLM's answer is based on. You can open the reference right from the search page and check if the answer is there and the model didn't hallucinate:


components:
  - name: DocumentStore
    type: DeepsetCloudDocumentStore
  - name: EmbeddingRetriever # Selects the most relevant documents from the document store
    type: EmbeddingRetriever # Uses a Transformer model to encode the document and the query
    params:
      document_store: DocumentStore
      embedding_model: intfloat/e5-base-v2 # Model optimized for semantic search. It has been trained on 215M (question, answer) pairs from diverse sources.
      model_format: sentence_transformers
      top_k: 10 # The number of results to return
  - name: qa_template
    type: PromptTemplate
    params:
      prompt: deepset/question-answering #This uses one of the ready-made templates from Prompt Studio
      output_parser:
        type: AnswerParser # We need the output of PromptNode to be an answer object as ReferencePredictor accepts answers as input
  - name: PromptNode
    type: PromptNode
    params:
      default_prompt_template: qa_template
      max_length: 400 # The maximum number of tokens the generated answer can have
      model_kwargs: # Specifies additional model settings
        temperature: 0 # Lower temperature works best for fact-based qa
      model_name_or_path: gpt-3.5-turbo
  - name: ReferencePredictor
    type: ReferencePredictor
    params:
      language: en
      use_split_rules: True
      extend_abbreviations: True
  - name: FileTypeClassifier # Routes files based on their extension to appropriate converters, by default txt, pdf, md, docx, html
    type: FileTypeClassifier
  - name: TextConverter # Converts files into documents
    type: TextConverter
  - name: PDFConverter # Converts PDFs into documents
    type: PDFToTextConverter
  - name: Preprocessor # Splits documents into smaller ones and cleans them up
    type: PreProcessor
    params:
      # With a vector-based retriever, it's good to split your documents into smaller ones
      split_by: word # The unit by which you want to split the documents
      split_length: 250 # The max number of words in a document
      split_overlap: 20 # Enables the sliding window approach
      language: en
      split_respect_sentence_boundary: True # Retains complete sentences in split documents

# Here you define how the nodes are organized in the pipelines
# For each node, specify its input
pipelines:
  - name: query
    nodes:
      - name: EmbeddingRetriever
        inputs: [Query]
      - name: PromptNode
        inputs: [EmbeddingRetriever]
      - name: ReferencePredictor
        inputs: [PromptNode]
  - name: indexing
    nodes:
    # Depending on the file type, we use a Text or PDF converter
      - name: FileTypeClassifier
        inputs: [File]
      - name: TextConverter
        inputs: [FileTypeClassifier.output_1] # Ensures that this converter receives txt files
      - name: PDFConverter
        inputs: [FileTypeClassifier.output_2] # Ensures that this converter receives PDFs
      - name: Preprocessor
        inputs: [TextConverter, PDFConverter]
      - name: EmbeddingRetriever
        inputs: [Preprocessor]
      - name: DocumentStore
        inputs: [EmbeddingRetriever]

Pipeline with a SpellChecker PromptNode

This pipeline chains two PromptNodes. The first one acts as a spell checker for the query. It corrects the query, if necessary, and then sends it to the next PromptNode, whose task is to generate an answer to the query.

 components:
    - name: DocumentStore
      type: DeepsetCloudDocumentStore # The only supported document store in deepset Cloud
      params:
        embedding_dim: 768
        similarity: cosine
    - name: query_spell_check
      type: PromptTemplate
      params:
        prompt: >
          You are a spelling correction system.
          {new_line}You receive a question and correct it.
          {new_line}Output only the corrected question
          {new_line}Question: {query}
          {new_line}Corrected Question:
    - name: SpellCheckPromptNode
      type: PromptNode
      params:
        default_prompt_template: query_spell_check
        max_length: 650 # The maximum number of tokens the generated answer can have
        model_kwargs: # Specifies additional model settings
          temperature: 0 # Lower temperature works best for fact-based results
        model_name_or_path: gpt-3.5-turbo
    - name: ListToString # Converts the output from SpellCheckPromptNode into a single query string, which is the input type the retriever expects.
      type: Shaper
      params:
        func: join_strings
        inputs:
          strings: results
        outputs:
          - query
    - name: EmbeddingRetriever # Selects the most relevant documents from the document store
      type: EmbeddingRetriever # Uses a Transformer model to encode the document and the query
      params:
        document_store: DocumentStore
        embedding_model: intfloat/e5-base-v2 # Model optimized for semantic search. It has been trained on 215M (question, answer) pairs from diverse sources.
        model_format: sentence_transformers
        top_k: 20 # The number of results to return
    - name: Reranker # Uses a cross-encoder model to rerank the documents returned by the two retrievers
      type: SentenceTransformersRanker
      params:
        model_name_or_path: intfloat/simlm-msmarco-reranker # Fast model optimized for reranking
        top_k: 4 # The number of results to return
        batch_size: 20  # Try to keep this number equal or larger to the sum of the top_k of the two retrievers so all docs are processed at once
        model_kwargs:  # Additional keyword arguments for the model
          torch_dtype: torch.float16
    - name: qa_template
      type: PromptTemplate
      params:
        output_parser:
          type: AnswerParser
        prompt: >
          You are a technical expert.
          {new_line}You answer questions truthfully based on provided documents.
          {new_line}For each document check whether it is related to the question.
          {new_line}Only use documents that are related to the question to answer it.
          {new_line}Ignore documents that are not related to the question.
          {new_line}If the answer exists in several documents, summarize them.
          {new_line}Only answer based on the documents provided. Don't make things up.
          {new_line}Always use references in the form [NUMBER OF DOCUMENT] when using information from a document. e.g. [3], for Document[3].
          {new_line}The reference must only refer to the number that comes in square brackets after passage.
          {new_line}Otherwise, do not use brackets in your answer and reference ONLY the number of the passage without mentioning the word passage.
          {new_line}If the documents can't answer the question or you are unsure say: 'The answer can't be found in the text'.
          {new_line}These are the documents:
          {join(documents, delimiter=new_line, pattern=new_line+'Document[$idx]:'+new_line+'$content')}
          {new_line}Question: {query}
          {new_line}Answer:
    - name: PromptNode
      type: PromptNode
      params:
        default_prompt_template: qa_template
        max_length: 400 # The maximum number of tokens the generated answer can have
        model_kwargs: # Specifies additional model settings
          temperature: 0 # Lower temperature works best for fact-based qa
        model_name_or_path: gpt-3.5-turbo
    - name: FileTypeClassifier # Routes files based on their extension to appropriate converters, by default txt, pdf, md, docx, html
      type: FileTypeClassifier
    - name: TextConverter # Converts files into documents
      type: TextConverter
    - name: PDFConverter # Converts PDFs into documents
      type: PDFToTextConverter
    - name: Preprocessor # Splits documents into smaller ones and cleans them up
      type: PreProcessor
      params:
        # With a vector-based retriever, it's good to split your documents into smaller ones
        split_by: word # The unit by which you want to split the documents
        split_length: 250 # The max number of words in a document
        split_overlap: 20 # Enables the sliding window approach
        language: en
        split_respect_sentence_boundary: True # Retains complete sentences in split documents

  # Here you define how the nodes are organized in the pipelines
  # For each node, specify its input
  pipelines:
    - name: query
      nodes:
        - name: SpellCheckPromptNode
          inputs: [Query]
        - name: ListToString
          inputs: [SpellCheckPromptNode]
        - name: EmbeddingRetriever
          inputs: [ListToString]
        - name: Reranker
          inputs: [EmbeddingRetriever]
        - name: PromptNode
          inputs: [Reranker]
    - name: indexing
      nodes:
      # Depending on the file type, we use a Text or PDF converter
        - name: FileTypeClassifier
          inputs: [File]
        - name: TextConverter
          inputs: [FileTypeClassifier.output_1] # Ensures that this converter receives txt files
        - name: PDFConverter
          inputs: [FileTypeClassifier.output_2] # Ensures that this converter receives PDFs
        - name: Preprocessor
          inputs: [TextConverter, PDFConverter]
        - name: EmbeddingRetriever
          inputs: [Preprocessor]
        - name: DocumentStore
          inputs: [EmbeddingRetriever]

Pipeline with the Llama 2 Model Hosted on AWS Bedrock

This pipeline uses both keyword-based and vector-based Retriever to fetch the documents from the document store and pass them on to the model in the prompt. The model is Llama 2, the 13b version hosted on AWS Bedrock. To use a model hosted on Bedrock, pass its model ID preceded by deepset-cloud, like in line 72 of the YAML below. You can also swap the Llama model to the 70b version. For a list of models with their IDs, see AWS Bedrock Base Model IDs.


components:
  - name: DocumentStore
    type: DeepsetCloudDocumentStore
    params:
      embedding_dim: 768
      similarity: cosine
  - name: BM25Retriever # The keyword-based retriever
    type: BM25Retriever
    params:
      document_store: DocumentStore
      top_k: 10 # The number of results to return 
  - name: EmbeddingRetriever # Selects the most relevant documents from the document store
    type: EmbeddingRetriever # Uses a Transformer model to encode the document and the query
    params:
      document_store: DocumentStore
      embedding_model: intfloat/e5-base-v2 # Model optimized for semantic search. It has been trained on 215M (question, answer) pairs from diverse sources.
      model_format: sentence_transformers
      top_k: 10 # The number of results to return
  - name: JoinResults # Joins the results from both retrievers
    type: JoinDocuments
    params:
      join_mode: concatenate # Combines documents from multiple retrievers
  - name: Reranker # Uses a cross-encoder model to rerank the documents returned by the two retrievers
    type: CNSentenceTransformersRanker
    params:
      model_name_or_path: intfloat/simlm-msmarco-reranker # Fast model optimized for reranking
      top_k: 4 # The number of results to return
      batch_size: 20  # Try to keep this number equal or larger to the sum of the top_k of the two retrievers so all docs are processed at once
      model_kwargs:  # Additional keyword arguments for the model
        torch_dtype: torch.float16
  - name: qa_template
    type: PromptTemplate
    params:
      output_parser:
        type: AnswerParser
      prompt: >-
        <s>[INST] <<SYS>>
        {new_line}You are a technical expert.
        You answer questions truthfully based on provided documents.
        For each document check whether it is related to the question.
        Only use documents that are related to the question to answer it.
        Ignore documents that are not related to the question.
        If the answer exists in several documents, summarize them.
        Only answer based on the documents provided. Don't make things up.
        Always use references in the form [NUMBER OF DOCUMENT] when using information from a document. e.g. [3], for Document[3].
        The reference must only refer to the number that comes in square brackets after passage.
        Otherwise, do not use brackets in your answer and reference ONLY the number of the passage without mentioning the word passage.
        If the documents can't answer the question or you are unsure say: 'The answer can't be found in the text'.
        {new_line}<</SYS>>
        {new_line}{new_line}These are the documents:
        {join(documents, delimiter=new_line, pattern=new_line+'Document[$idx]:'+new_line+'$content')}
        {new_line}Question: {query}
        {new_line}Answer:
        {new_line}[/INST]
  - name: PromptNode
    type: PromptNode
    params:
      default_prompt_template: qa_template
      max_length: 400 # The maximum number of tokens the generated answer can have
      model_kwargs: # Specifies additional model settings
        temperature: 0 # Lower temperature works best for fact-based qa
      model_name_or_path: deepset-cloud-meta.llama2-13b-chat-v1
  - name: FileTypeClassifier # Routes files based on their extension to appropriate converters, by default txt, pdf, md, docx, html
    type: FileTypeClassifier
  - name: TextConverter # Converts files into documents
    type: TextConverter
  - name: PDFConverter # Converts PDFs into documents
    type: PDFToTextConverter
  - name: Preprocessor # Splits documents into smaller ones and cleans them up
    type: PreProcessor
    params:
      # With a vector-based retriever, it's good to split your documents into smaller ones
      split_by: word # The unit by which you want to split the documents
      split_length: 250 # The max number of words in a document
      split_overlap: 20 # Enables the sliding window approach
      language: en
      split_respect_sentence_boundary: True # Retains complete sentences in split documents

# Here you define how the nodes are organized in the pipelines
# For each node, specify its input
pipelines:
  - name: query
    nodes:
      - name: BM25Retriever
        inputs: [Query]
      - name: EmbeddingRetriever
        inputs: [Query]
      - name: JoinResults
        inputs: [BM25Retriever, EmbeddingRetriever]
      - name: Reranker
        inputs: [JoinResults]
      - name: PromptNode
        inputs: [Reranker]
  - name: indexing
    nodes:
    # Depending on the file type, we use a Text or PDF converter
      - name: FileTypeClassifier
        inputs: [File]
      - name: TextConverter
        inputs: [FileTypeClassifier.output_1] # Ensures that this converter receives txt files
      - name: PDFConverter
        inputs: [FileTypeClassifier.output_2] # Ensures that this converter receives PDFs
      - name: Preprocessor
        inputs: [TextConverter, PDFConverter]
      - name: EmbeddingRetriever
        inputs: [Preprocessor]
      - name: DocumentStore
        inputs: [EmbeddingRetriever]