Shaper

Shaper is most often used with PromptNode to ensure the input the PromptNode receives and the output it generates match the expected format.

Shaper can modify the output or input of nodes. It comes with ready-to-use functions that act on values, renaming them or changing their type, for example from a list to a string. You can choose the function you want to use when configuring the node in your pipeline. Shaper functions are handy when you want to use PromptNode in a pipeline, and you need it to generate a specific output.

Basic Information

  • Pipeline Type: Used in query pipelines with a PromptNode
  • Nodes that can precede it in a pipeline: PromptNode
  • Nodes that can follow it in a pipeline: PromptNode, Retriever
    It can also be used in between two PromptNodes.
  • Input/output: Differs depending on the function used. Check the function documentation below.
  • Available Classes: Shaper

Usage

When adding Shaper to your pipeline, specify the function you want it to use in Shaper's parameters. In this example, Shaper renames the value query into question. The resulting value quesiton is then passed down the pipeline:

- name: shaper
  type: Shaper
  params:
    func: rename
    inputs:
      value: query
    outputs: [question]

For more information about functions, see the Functions section.

Shaper and PromptNode

When used with PromptNode, Shaper acts as a PromptNode helper. Let's recall how PromptNode works:

  • PromptNode uses PromptTemplate containing the prompt, or instruction, for the large language model.
  • PromptTemplate contains variables that are substituted with real values when PromptNode runs.

In a pipeline, PromptNode receives these variables from the preceding node. It may happen that the variable names or shapes the PromptTemplate expects differ from the ones the PromptNode receives. That's when Shaper comes in and resolves this issue.

You can also use Shaper in reverse situations. If the output of a PromptNode differs from the format the next node in the pipeline expects, Shaper can change it.

See also PromptNode documentation.

Example

Let's see how to use Shaper between PromptNode and a Retriever. This example is a RAG pipeline with an additional PromptNode that acts as the query spell checker. Here's how it works:

  1. The spell-checking PromptNode takes in the query and corrects it.
  2. The corrected query is sent to the Retriever, which fetches relevant documents from the document store.
  3. The Ranker then takes these documents, ranks them, and sends them to the answer generator PromptNode.

There's one problem with this flow: the output of the spell-checking PromptNode is incompatible with the input of the Retriever. In this case, PromptNode's PromptTemplate is not using any output_parser, so the PromptNode's output is a string under the key called "results," while the Retriever needs a query as input. We can easily fix this by putting a Shaper with the join_strings function between the spell-checking PromptNode and the Retriever:

 
  components:
    - name: DocumentStore
      type: DeepsetCloudDocumentStore # The only supported document store in deepset Cloud
      params:
        embedding_dim: 768
        similarity: cosine
    - name: query_spell_check
      type: PromptTemplate
      params:
        prompt: >
          You are a spelling correction system.
          {new_line}You receive a question and correct it.
          {new_line}Output only the corrected question
          {new_line}Question: {query}
          {new_line}Corrected Question:
    - name: SpellCheckPromptNode
      type: PromptNode
      params:
        default_prompt_template: query_spell_check
        max_length: 650 # The maximum number of tokens the generated answer can have
        model_kwargs: # Specifies additional model settings
          temperature: 0 # Lower temperature works best for fact-based results
        model_name_or_path: gpt-3.5-turbo
    - name: StringToQuery # Converts the output from SpellCheckPromptNode into a single query string, which is the input type the retriever expects.
      type: Shaper
      params:
        func: join_strings
        inputs:
          strings: results #The default output from PromptNode
        outputs:
          - query #The input the Retriever expects
    - name: EmbeddingRetriever # Selects the most relevant documents from the document store
      type: EmbeddingRetriever # Uses a Transformer model to encode the document and the query
      params:
        document_store: DocumentStore
        embedding_model: intfloat/e5-base-v2 # Model optimized for semantic search. It has been trained on 215M (question, answer) pairs from diverse sources.
        model_format: sentence_transformers
        top_k: 20 # The number of results to return
    - name: Reranker # Uses a cross-encoder model to rerank the documents returned by the two retrievers
      type: CNSentenceTransformersRanker
      params:
        model_name_or_path: intfloat/simlm-msmarco-reranker # Fast model optimized for reranking
        top_k: 4 # The number of results to return
        batch_size: 20  # Try to keep this number equal or larger to the sum of the top_k of the two retrievers so all docs are processed at once
        model_kwargs:  # Additional keyword arguments for the model
          torch_dtype: torch.float16
    - name: qa_template
      type: PromptTemplate
      params:
        output_parser:
          type: AnswerParser
        prompt: >
          You are a technical expert.
          {new_line}You answer questions truthfully based on provided documents.
          {new_line}For each document check whether it is related to the question.
          {new_line}Only use documents that are related to the question to answer it.
          {new_line}Ignore documents that are not related to the question.
          {new_line}If the answer exists in several documents, summarize them.
          {new_line}Only answer based on the documents provided. Don't make things up.
          {new_line}Always use references in the form [NUMBER OF DOCUMENT] when using information from a document. e.g. [3], for Document[3].
          {new_line}The reference must only refer to the number that comes in square brackets after passage.
          {new_line}Otherwise, do not use brackets in your answer and reference ONLY the number of the passage without mentioning the word passage.
          {new_line}If the documents can't answer the question or you are unsure say: 'The answer can't be found in the text'.
          {new_line}These are the documents:
          {join(documents, delimiter=new_line, pattern=new_line+'Document[$idx]:'+new_line+'$content')}
          {new_line}Question: {query}
          {new_line}Answer:
    - name: PromptNode
      type: PromptNode
      params:
        default_prompt_template: qa_template
        max_length: 400 # The maximum number of tokens the generated answer can have
        model_kwargs: # Specifies additional model settings
          temperature: 0 # Lower temperature works best for fact-based qa
        model_name_or_path: gpt-3.5-turbo
    - name: FileTypeClassifier # Routes files based on their extension to appropriate converters, by default txt, pdf, md, docx, html
      type: FileTypeClassifier
    - name: TextConverter # Converts files into documents
      type: TextConverter
    - name: PDFConverter # Converts PDFs into documents
      type: PDFToTextConverter
    - name: Preprocessor # Splits documents into smaller ones and cleans them up
      type: PreProcessor
      params:
        # With a vector-based retriever, it's good to split your documents into smaller ones
        split_by: word # The unit by which you want to split the documents
        split_length: 250 # The max number of words in a document
        split_overlap: 20 # Enables the sliding window approach
        language: en
        split_respect_sentence_boundary: True # Retains complete sentences in split documents

  pipelines:
    - name: query
      nodes:
        - name: SpellCheckPromptNode
          inputs: [Query]
        - name: ListToString
          inputs: [SpellCheckPromptNode]
        - name: EmbeddingRetriever
          inputs: [ListToString]
        - name: Reranker
          inputs: [EmbeddingRetriever]
        - name: PromptNode
          inputs: [Reranker]
    - name: indexing
      nodes:
      # Depending on the file type, we use a Text or PDF converter
        - name: FileTypeClassifier
          inputs: [File]
        - name: TextConverter
          inputs: [FileTypeClassifier.output_1] # Ensures that this converter receives txt files
        - name: PDFConverter
          inputs: [FileTypeClassifier.output_2] # Ensures that this converter receives PDFs
        - name: Preprocessor
          inputs: [TextConverter, PDFConverter]
        - name: EmbeddingRetriever
          inputs: [Preprocessor]
        - name: DocumentStore
          inputs: [EmbeddingRetriever]

Have a look at the Functions section to understand what functions are available.

Shaper Functions

Functions follow this format:

- name: shaper
  type: Shaper
  params:
    func: function_name
    inputs:
      <input>: <name of the input key> # this is the name of the output key of the preceding node
      <param_name>: <param_value> # not all functions have input parameters
    outputs: <name of the output key> # this is the name of the key you want to use for the output
                                      # make sure this key name is compatible with the input of the subsequent node in the pipeline
      <param_name>: <param_value> # not all functions have output parameters

You can check a node's input and output keys in the node's documentation. Shaper is often used in between PromptNodes, so then the input and output keys depend on the task the PromptNode performs.

These are the functions you can use with Shaper:

answers_to_strings

Extracts the content field of Answers and returns a list of strings.

  • Input: List of answer objects

  • Output: List of strings

  • Parameters:

    NameTypePossible valuesDescription
    answersList of answersThe answer key returned by the preceding node.An input argument. Specifies the answers you want to turn into a list of strings.
    Required.
    patternStringDefault: NoneAn input argument. Specifies the regex pattern used for parsing the answer. You can use the following placeholders:
    - $id: The ID of the answer
    - $META_FIELD: The value of the metadata field called META_FIELD.
    If None, the whole string is used as the answer. If not None, the first group of the regex is used as the answer. If there is no group, the whole match is used as the answer.
    Optional.
    str_replaceDictionary of stringsDefault: NoneAn input argument. Specifies the character or string you want to replace in the output string.
    Optional.
  • Example:

    - name: AnswerShaper 
      type: Shaper
      params:
         func: answers_to_strings
         inputs:
           answers: results
           str_replace:
             r: R
         outputs:
           - documents 
    
current_datetime

Returns the current time and date in the format you specify.

  • Input: String

  • Output: The current date and time as a string.

  • Parameters:

    NameTypePossible valuesDescription
    formatString%H:%M:%S %d/%m/%y
    Default: %H:%M:%S %d/%m/%y
    An input parameter. Sets the format of the date and time. Use the following symbols to indicate how you want to display the date and time:
    - %d - day
    - %m - month
    - %y - year
    - %H - hour
    - %M - minute
    - %S - second
    Required.

Examples:
This example returns the current time and date in the format DD MM YYYYY HH:MM:SS:

- name: shaper
  type: Shaper
  params:
    func: current_datetime
    inputs:
      format: %H:%M:%S %d/%m/%y
    outputs: [string]

The output of this function would look like: 01.01.2023 12:30:10.

This example returns the current time only:

- name: shaper
  type: Shaper
  params:
    func: current_datetime
    inputs:
      format: %d/%m/%y
    outputs: [string]
documents_to_strings

Extracts the content field of each document you pass to it and puts it in a list of strings. Each item in this list is the content of the content field of one document.

  • Input: String (a single document) or a list of strings (a list of documents)

  • Output: List of strings

  • Parameters:

    NameTypePossible valuesDescription
    documentsList of documentsAn input parameter. Specifies the list of documents you want to transform into a list of strings.
    Required.
    patternStringDefault: NoneAn input parameter. Contains the regex pattern used for parsing the documents. You can use the following placeholders:

    - $content: The content of the document
    - $idx: The index of the document in the list
    - $id: The ID of the document
    - $META_FIELD: The value of the metadata field called META_FIELD.
    If None, no parsing is done, and all documents are referenced.
    Optional.
    str_replaceDictionary of stringsDefault: NoneAn input parameter. The character or string you want to replace in the output string.
    Optional.
  • Example:

    - name: DocsToStrings
      type: Shaper
      params:
         func: documents_to_strings
         inputs:
           documents:  
             - documents
         outputs:
           - string 
    
join_documents

Takes a list of documents and changes it into a list containing a single document. The new list contains all the original documents separated by the specified delimiter. All metadata is dropped.

  • Input: List of documents

  • Ouput: List containing a single document

  • Parameters:

    NameTypePossible valuesDescription
    documentsListList of documentsAn input parameter. Specifies the list of documents you want to change into a list containing a single document.
    Required.
    delimiterStringThe symbol you want to use as a delimiter
    Default: " " (space)
    An input parameter. The character or symbol you want to use to divide the lists.
    Required.
    patternStringDefault: NoneAn input parameter. Specifies the parsing of the documents in the output list. Use regex to define that. You can use the following placeholders:
    - $content: The content of the document
    - $idx: The index of the document in the list
    - $id: The ID of the document
    - $META_FIELD: The value of the metadata field called META_FIELD.
    If None, no parsing is done.
    Optional.
    str_replaceDictionary of stringsstring_to_replace: new_string
    Default: None
    An input parameter. The character or string you want to replace in the final list.
    Optional.
  • Example: If you have a pipeline with PromptNode and a PromptTemplate with two parameters, for example, question and documents. To make sure PromptNode runs the question against all documents, you can merge the documents into one:

    - name: joinDocs
      type: Shaper
      params:
        func: join_documents
        inputs:
         - documents
        outputs:
         - documents
    
join_documents_and_scores

Transforms a list of documents with scores in their metadata into a list containing a single document.
The resulting document contains the scores and the contents of all the original documents. All metadata is dropped.

  • Input: A list of documents

  • Output: A list containing a single document

  • Parameters:

    NameTypePossible valuesDescription
    documentsListList of documentsAn input parameter. A list of documents with scores that you want to transform into a single document.
    Required.
  • Example:

    -  name: joinDocsAndScores
       type: Shaper
       params:
        func: join_documents_and_scores
        inputs:
         - documents
        outputs:
         - documents
    
join_lists

Joins multiple lists into a single list.

  • Input: List of lists

  • Output: List

  • Parameters:

    NameTypePossible valuesDescription
    listsListListsAn input parameter. The lists you want to merge.
    Required.
  • Example:

    - name: joinLists
       type: Shaper
       params:
        func: join_lists
        inputs:
         - list1
         - list2
        outputs:
         - list
    
join_strings

Takes a list of strings and changes it into a single string. The string contains all the original strings separated by the specified delimiter.

  • Input: List of strings

  • Output: String

  • Parameters:

    NameTypePossible valuesDescription
    stringsList of stringsNames of lists of stringsAn input parameter. Contains the names of the lists of strings you want to merge into a single string.
    Required.
    delimiterStringThe symbol you want to use as a delimiter
    Default: " " (space)
    An input parameter. Specifies the character or symbol you want to use to divide the lists.
    Required.
    str_replaceDictionary of stringsstring_to_replace: new_string
    Default: None
    An input parameter. Specifies the character or string you want to replace in the final list.
    Optional.
  • Example:

    - name: JoinStrings
      type: Shaper
          params:
            func: join_strings
            inputs:
              strings: 
                - first
                - second
                - third
              delimiter: "-"
              str_replace: r: R
            outputs:
              - string
    
    # The expected output of this function is: "fiRst-second-thirRd"
    
rename

Renames a value without changing it.

  • Input: Any type

  • Output: The same type as input but renamed

  • Parameters:

    NameTypePossible valuesDescription
    valueAnyAnyAn input parameter. Specifies the name of the value to be renamed.
    Required.
  • Example: This example renames query to question.

    - name: shaper
      type: Shaper
      params:
        func: rename
        inputs:
          value: query
        outputs: [question]
    
strings_to_answers

Transforms a list of strings into a list of answer objects.

  • Input: List of strings

  • Output: List of answer objects

  • Parameters:

    NameTypePossible valuesDescription
    stringsList of stringsAn input parameter. Specifies a list of strings you want to turn into a list of answers.
    Required.
    promptsStringDefault: NoneThe prompts used to generate the answers
    Optional.
    documentsList of documentsDefault: NoneThe documents based on which the answer is generated.
    Optional.
    patternStringDefault: NoneThe regex pattern used for parsing the answer. You can use the following placeholders:

    - $content: The content of the document
    - $idx: The index of the document in the list
    - $id: The ID of the document
    - $META_FIELD: The value of the metadata field called META_FIELD.
    If None, the whole string is used as the answer. If not None, the first group of the regex is used as the answer. If there is no group, the whole match is used as the answer.
    Optional.
    reference_patternStringDefault: NoneThe regex pattern to use for parsing the document references.
    If None, no parsing is done, and all documents are references.
    Optional.
    reference_modeLiteralindex
    id
    meta
    Default: index
    The mode for referencing documents. Supported modes are:
    - index: the document references are the one-based index of the document in the list of documents.
    Example: "this is an answer[1]" references the first document in the list of documents.
    - id: the document references are the document IDs.
    Example: "this is an answer[123]" references the document with id "123".
    - meta: the document references are the value of a metadata field of the document.
    Example: "this is an answer[123]" references the document with the value "123" in the metadata field specified by reference_meta_field.
    Required.
    reference_meta_fieldStringDefault: NoneThe name of the metadata field to use for document references in reference_mode: meta.
    Optional.
  • Example: This function may be useful if PromptNode is the last node in a pipeline. The output of the PromptNode is a string, while deepset Cloud pipelines expect the Answer object. You may then add a Shaper with the strings_to_answers option at the end of the pipeline after PromptNode.

    - name: OutputAnswerShaper 
        type: Shaper
        params:
          func: strings_to_answers 
          inputs:
            strings: results # the results PromptNode returns
          outputs:
            - answers
    
strings_to_documents

Changes a list of strings into a list of documents. If you pass the metadata in a single dictionary, all documents get the same metadata. If you pass the metadata as a list, the length of this list must be the same as the length of the list of strings, and each document gets its own metadata. You can specify id_hash_keys only once and it gets assigned to all documents.

  • Input: List of strings

  • Output: List of documents

  • Parameters:

    NameTypePossible valuesDescription
    stringsList of stringsAn input parameter. Contains the list of strings to transform into a list of documents.
    Required.
    metaDictionaries of string and any valueDefault: NoneAn input parameter. Specifies the metadata to attach to the resulting list of documents. If you pass a single dictionary, all documents get the metadata from this dictionary. If you pass a list of metadata, each document gets its own metadata, but the list's length must be the same as the length of the list of strings.
    Optional.
    id_hash_keysList of stringsDefault: NoneAn input parameter. Generates the document ID from a custom list of strings that refer to the document's attributes. To make sure there are no duplicate documents in your document store if document texts are the same, you can modify the metadata of a document and then pass ["content", "metadata"] to this field to generate IDs based on the document content and the defined metadata.
    Optional.
  • Example:

    - name: StringsToDocs 
      type: Shaper
      params:
         func: strings_to_documents
         inputs:
           strings:  
             - [string1, string2, string3]
         outputs:
           - documents
    
value_to_list

Turns a value into a list. The value is repeated in the list to match the length of the list. For example, if you set the list length to five, the value is repeated in this list five times.

  • Input: Any

  • Output: List containing the input value as many times as specified.

  • Parameters:

    NameTypePossible valuesDescription
    valueAnyAnyAn input parameter. The name of the value you want to turn into a list. Required.
    target_listList-An output parameter. Specifies the desired length of the output list in square brackets, for example: target_list: [9]
    Required.
  • Example: If your PromptTemplate has two parameters: question and documents, and you want the question to be processed against each document, use Shaper with the value_to_list function. It creates a list in which the question is repeated as many times as there are documents. PromptNode then processes each item from each list one by one against each other.

    - name: QuestionsShaper 
      type: Shaper
      params:
        func: value_to_list 
        inputs:
          value: query
        outputs:
          - questions
          params:
            target_list: [5]
    

After performing a function, Shaper passes the new or modified values further down the pipeline.

Parameters

These are the parameters you can specify for Shaper in pipeline YAML:

ParameterTypePossible ValuesDescription
funcStringrename
value_to_list
join_strings
join_documents
join_lists
strings_to_answers
answers_to_strings
strings_to_documents
documents_to_strings
The function you want to use with Shaper. For more information, see the Functions section.
Mandatory.
outputsList of stringsThe key to store the outputs of the Shaper's function. The length of outputs must match the number of outputs produced by the function you specified for the Shaper.
Mandatory.
inputsDictionaryMaps the function's input keyword arguments to the key-value pairs in the invocation context.
For example, the value_to_list function expects two inputs: value and taget_list, so inputs for this function could be: {value : query , target_list : documents}.
Optional.
paramsDictionaryMaps the function's input keyword arguments to fixed values.
For example, the value_to_list function expects value and target_list parameters,so params might be {value : A, target_list : [1, 1, 1, 1]}. The node's output would be: ["A", "A", "A", "A"].
Optional.
publish_outputsUnion of Boolean and List of StringsDefault: TruePublishes Shaper's outputs to the pipeline's output.
True - publishes all outputs.
False - doesn't publish any output.
Mandatory.