Shaper can modify the output or input of nodes. It comes with ready-to-use functions that act on values, renaming them or changing their type, for example from a list to a string. You can choose the function you want to use when configuring the node in your pipeline. Shaper functions are handy when you want to use PromptNode in a pipeline, and you need it to generate a specific output.

Basic Information

Pipeline Type: Used in query pipelines with a PromptNode
Nodes that can precede it in a pipeline: PromptNode
Nodes that can follow it in a pipeline: PromptNode, Retriever
It can also be used in between two PromptNodes.
Input/output: Differs depending on the function used. Check the function documentation below.
Available Classes: Shaper

Usage

When adding Shaper to your pipeline, specify the function you want it to use in Shaper's parameters. In this example, Shaper renames the value query into question. The resulting value quesiton is then passed down the pipeline:

- name: shaper
  type: Shaper
  params:
    func: rename
    inputs:
      value: query
    outputs: [question]

For more information about functions, see the Functions section.

Shaper and PromptNode

When used with PromptNode, Shaper acts as a PromptNode helper. Let's recall how PromptNode works:

PromptNode uses PromptTemplate containing the prompt, or instruction, for the large language model.
PromptTemplate contains variables that are substituted with real values when PromptNode runs.

In a pipeline, PromptNode receives these variables from the preceding node. It may happen that the variable names or shapes the PromptTemplate expects differ from the ones the PromptNode receives. That's when Shaper comes in and resolves this issue.

You can also use Shaper in reverse situations. If the output of a PromptNode differs from the format the next node in the pipeline expects, Shaper can change it.

Example

Let's see how to use Shaper between PromptNode and a Retriever. This example is a RAG pipeline with an additional PromptNode that acts as the query spell checker. Here's how it works:

The spell-checking PromptNode takes in the query and corrects it.
The corrected query is sent to the Retriever, which fetches relevant documents from the document store.
The Ranker then takes these documents, ranks them, and sends them to the answer generator PromptNode.

There's one problem with this flow: the output of the spell-checking PromptNode is incompatible with the input of the Retriever. In this case, PromptNode's PromptTemplate is not using any output_parser, so the PromptNode's output is a string under the key called "results," while the Retriever needs a query as input. We can easily fix this by putting a Shaper with the join_strings function between the spell-checking PromptNode and the Retriever:

 
  components:
    - name: DocumentStore
      type: DeepsetCloudDocumentStore # The only supported document store in deepset Cloud
      params:
        embedding_dim: 768
        similarity: cosine
    - name: query_spell_check
      type: PromptTemplate
      params:
        prompt: >
          You are a spelling correction system.
          {new_line}You receive a question and correct it.
          {new_line}Output only the corrected question
          {new_line}Question: {query}
          {new_line}Corrected Question:
    - name: SpellCheckPromptNode
      type: PromptNode
      params:
        default_prompt_template: query_spell_check
        max_length: 650 # The maximum number of tokens the generated answer can have
        model_kwargs: # Specifies additional model settings
          temperature: 0 # Lower temperature works best for fact-based results
        model_name_or_path: gpt-3.5-turbo
    - name: StringToQuery # Converts the output from SpellCheckPromptNode into a single query string, which is the input type the retriever expects.
      type: Shaper
      params:
        func: join_strings
        inputs:
          strings: results #The default output from PromptNode
        outputs:
          - query #The input the Retriever expects
    - name: EmbeddingRetriever # Selects the most relevant documents from the document store
      type: EmbeddingRetriever # Uses a Transformer model to encode the document and the query
      params:
        document_store: DocumentStore
        embedding_model: intfloat/e5-base-v2 # Model optimized for semantic search. It has been trained on 215M (question, answer) pairs from diverse sources.
        model_format: sentence_transformers
        top_k: 20 # The number of results to return
    - name: Reranker # Uses a cross-encoder model to rerank the documents returned by the two retrievers
      type: CNSentenceTransformersRanker
      params:
        model_name_or_path: intfloat/simlm-msmarco-reranker # Fast model optimized for reranking
        top_k: 4 # The number of results to return
        batch_size: 20  # Try to keep this number equal or larger to the sum of the top_k of the two retrievers so all docs are processed at once
        model_kwargs:  # Additional keyword arguments for the model
          torch_dtype: torch.float16
    - name: qa_template
      type: PromptTemplate
      params:
        output_parser:
          type: AnswerParser
        prompt: >
          You are a technical expert.
          {new_line}You answer questions truthfully based on provided documents.
          {new_line}For each document check whether it is related to the question.
          {new_line}Only use documents that are related to the question to answer it.
          {new_line}Ignore documents that are not related to the question.
          {new_line}If the answer exists in several documents, summarize them.
          {new_line}Only answer based on the documents provided. Don't make things up.
          {new_line}Always use references in the form [NUMBER OF DOCUMENT] when using information from a document. e.g. [3], for Document[3].
          {new_line}The reference must only refer to the number that comes in square brackets after passage.
          {new_line}Otherwise, do not use brackets in your answer and reference ONLY the number of the passage without mentioning the word passage.
          {new_line}If the documents can't answer the question or you are unsure say: 'The answer can't be found in the text'.
          {new_line}These are the documents:
          {join(documents, delimiter=new_line, pattern=new_line+'Document[$idx]:'+new_line+'$content')}
          {new_line}Question: {query}
          {new_line}Answer:
    - name: PromptNode
      type: PromptNode
      params:
        default_prompt_template: qa_template
        max_length: 400 # The maximum number of tokens the generated answer can have
        model_kwargs: # Specifies additional model settings
          temperature: 0 # Lower temperature works best for fact-based qa
        model_name_or_path: gpt-3.5-turbo
    - name: FileTypeClassifier # Routes files based on their extension to appropriate converters, by default txt, pdf, md, docx, html
      type: FileTypeClassifier
    - name: TextConverter # Converts files into documents
      type: TextConverter
    - name: PDFConverter # Converts PDFs into documents
      type: PDFToTextConverter
    - name: Preprocessor # Splits documents into smaller ones and cleans them up
      type: PreProcessor
      params:
        # With a vector-based retriever, it's good to split your documents into smaller ones
        split_by: word # The unit by which you want to split the documents
        split_length: 250 # The max number of words in a document
        split_overlap: 20 # Enables the sliding window approach
        language: en
        split_respect_sentence_boundary: True # Retains complete sentences in split documents

  pipelines:
    - name: query
      nodes:
        - name: SpellCheckPromptNode
          inputs: [Query]
        - name: ListToString
          inputs: [SpellCheckPromptNode]
        - name: EmbeddingRetriever
          inputs: [ListToString]
        - name: Reranker
          inputs: [EmbeddingRetriever]
        - name: PromptNode
          inputs: [Reranker]
    - name: indexing
      nodes:
      # Depending on the file type, we use a Text or PDF converter
        - name: FileTypeClassifier
          inputs: [File]
        - name: TextConverter
          inputs: [FileTypeClassifier.output_1] # Ensures that this converter receives txt files
        - name: PDFConverter
          inputs: [FileTypeClassifier.output_2] # Ensures that this converter receives PDFs
        - name: Preprocessor
          inputs: [TextConverter, PDFConverter]
        - name: EmbeddingRetriever
          inputs: [Preprocessor]
        - name: DocumentStore
          inputs: [EmbeddingRetriever]

Have a look at the Functions section to understand what functions are available.

Shaper Functions

Functions follow this format:

- name: shaper
  type: Shaper
  params:
    func: function_name
    inputs:
      <input>: <name of the input key> # this is the name of the output key of the preceding node
      <param_name>: <param_value> # not all functions have input parameters
    outputs: <name of the output key> # this is the name of the key you want to use for the output
                                      # make sure this key name is compatible with the input of the subsequent node in the pipeline
      <param_name>: <param_value> # not all functions have output parameters

You can check a node's input and output keys in the node's documentation. Shaper is often used in between PromptNodes, so then the input and output keys depend on the task the PromptNode performs.

These are the functions you can use with Shaper:

answers_to_strings

Extracts the content field of Answers and returns a list of strings.

Input: List of answer objects
Output: List of strings

Parameters:

Name	Type	Possible values	Description
`answers`	List of answers	The answer key returned by the preceding node.	An input argument. Specifies the answers you want to turn into a list of strings. Required.
`pattern`	String	Default: `None`	An input argument. Specifies the regex pattern used for parsing the answer. You can use the following placeholders: - `$id`: The ID of the answer - `$META_FIELD`: The value of the metadata field called `META_FIELD`. If `None`, the whole string is used as the answer. If not `None`, the first group of the regex is used as the answer. If there is no group, the whole match is used as the answer. Optional.
`str_replace`	Dictionary of strings	Default: `None`	An input argument. Specifies the character or string you want to replace in the output string. Optional.

Example:

- name: AnswerShaper 
  type: Shaper
  params:
     func: answers_to_strings
     inputs:
       answers: results
       str_replace:
         r: R
     outputs:
       - documents

current_datetime

Returns the current time and date in the format you specify.

Input: String
Output: The current date and time as a string.

Parameters:

Name	Type	Possible values	Description
`format`	String	`%H:%M:%S %d/%m/%y` Default: `%H:%M:%S %d/%m/%y`	An input parameter. Sets the format of the date and time. Use the following symbols to indicate how you want to display the date and time: - `%d` - day - `%m` - month - `%y` - year - `%H` - hour - `%M` - minute - `%S` - second Required.

Examples:
This example returns the current time and date in the format DD MM YYYYY HH:MM:SS:

- name: shaper
  type: Shaper
  params:
    func: current_datetime
    inputs:
      format: %H:%M:%S %d/%m/%y
    outputs: [string]

The output of this function would look like: 01.01.2023 12:30:10.

This example returns the current time only:

- name: shaper
  type: Shaper
  params:
    func: current_datetime
    inputs:
      format: %d/%m/%y
    outputs: [string]

documents_to_strings

Extracts the content field of each document you pass to it and puts it in a list of strings. Each item in this list is the content of the content field of one document.

Input: String (a single document) or a list of strings (a list of documents)
Output: List of strings

Parameters:

Name	Type	Possible values	Description
`documents`	List of documents		An input parameter. Specifies the list of documents you want to transform into a list of strings. Required.
`pattern`	String	Default: `None`	An input parameter. Contains the regex pattern used for parsing the documents. You can use the following placeholders: - `$content`: The content of the document - `$idx`: The index of the document in the list - `$id`: The ID of the document - `$META_FIELD`: The value of the metadata field called `META_FIELD`. If `None`, no parsing is done, and all documents are referenced. Optional.
`str_replace`	Dictionary of strings	Default: `None`	An input parameter. The character or string you want to replace in the output string. Optional.

Example:

- name: DocsToStrings
  type: Shaper
  params:
     func: documents_to_strings
     inputs:
       documents:  
         - documents
     outputs:
       - string

join_documents

Takes a list of documents and changes it into a list containing a single document. The new list contains all the original documents separated by the specified delimiter. All metadata is dropped.

Input: List of documents
Ouput: List containing a single document

Parameters:

Name	Type	Possible values	Description
`documents`	List	List of documents	An input parameter. Specifies the list of documents you want to change into a list containing a single document. Required.
`delimiter`	String	The symbol you want to use as a delimiter Default: `" "` (space)	An input parameter. The character or symbol you want to use to divide the lists. Required.
`pattern`	String	Default: `None`	An input parameter. Specifies the parsing of the documents in the output list. Use regex to define that. You can use the following placeholders: - `$content`: The content of the document - `$idx`: The index of the document in the list - `$id`: The ID of the document - `$META_FIELD`: The value of the metadata field called `META_FIELD`. If `None`, no parsing is done. Optional.
`str_replace`	Dictionary of strings	`string_to_replace`: `new_string` Default: `None`	An input parameter. The character or string you want to replace in the final list. Optional.

Example: If you have a pipeline with PromptNode and a PromptTemplate with two parameters, for example, question and documents. To make sure PromptNode runs the question against all documents, you can merge the documents into one:
```
- name: joinDocs
  type: Shaper
  params:
    func: join_documents
    inputs:
     - documents
    outputs:
     - documents
```

join_documents_and_scores

Transforms a list of documents with scores in their metadata into a list containing a single document.
The resulting document contains the scores and the contents of all the original documents. All metadata is dropped.

Input: A list of documents
Output: A list containing a single document
Parameters:

Name Type Possible values Description
documents List List of documents An input parameter. A list of documents with scores that you want to transform into a single document.
Required.

Name	Type	Possible values	Description
`documents`	List	List of documents	An input parameter. A list of documents with scores that you want to transform into a single document. Required.

Example:

-  name: joinDocsAndScores
   type: Shaper
   params:
    func: join_documents_and_scores
    inputs:
     - documents
    outputs:
     - documents

join_lists

Joins multiple lists into a single list.

Input: List of lists
Output: List
Parameters:

Name Type Possible values Description
lists List Lists An input parameter. The lists you want to merge.
Required.

Name	Type	Possible values	Description
`lists`	List	Lists	An input parameter. The lists you want to merge. Required.

Example:

- name: joinLists
   type: Shaper
   params:
    func: join_lists
    inputs:
     - list1
     - list2
    outputs:
     - list

join_strings

Takes a list of strings and changes it into a single string. The string contains all the original strings separated by the specified delimiter.

Input: List of strings
Output: String

Parameters:

Name	Type	Possible values	Description
`strings`	List of strings	Names of lists of strings	An input parameter. Contains the names of the lists of strings you want to merge into a single string. Required.
`delimiter`	String	The symbol you want to use as a delimiter Default: `" "` (space)	An input parameter. Specifies the character or symbol you want to use to divide the lists. Required.
`str_replace`	Dictionary of strings	`string_to_replace`: `new_string` Default: `None`	An input parameter. Specifies the character or string you want to replace in the final list. Optional.

Example:

- name: JoinStrings
  type: Shaper
      params:
        func: join_strings
        inputs:
          strings: 
            - first
            - second
            - third
          delimiter: "-"
          str_replace: r: R
        outputs:
          - string

# The expected output of this function is: "fiRst-second-thirRd"

rename

Renames a value without changing it.

Input: Any type
Output: The same type as input but renamed
Parameters:

Name Type Possible values Description
value Any Any An input parameter. Specifies the name of the value to be renamed.
Required.

Name	Type	Possible values	Description
`value`	Any	Any	An input parameter. Specifies the name of the value to be renamed. Required.

Example: This example renames query to question.

- name: shaper
  type: Shaper
  params:
    func: rename
    inputs:
      value: query
    outputs: [question]

strings_to_answers

Transforms a list of strings into a list of answer objects.

Input: List of strings
Output: List of answer objects

Parameters:

Name	Type	Possible values	Description
`strings`	List of strings		An input parameter. Specifies a list of strings you want to turn into a list of answers. Required.
`prompts`	String	Default: `None`	The prompts used to generate the answers Optional.
`documents`	List of documents	Default: `None`	The documents based on which the answer is generated. Optional.
`pattern`	String	Default: `None`	The regex pattern used for parsing the answer. You can use the following placeholders: - `$content`: The content of the document - `$idx`: The index of the document in the list - `$id`: The ID of the document - `$META_FIELD`: The value of the metadata field called `META_FIELD`. If `None`, the whole string is used as the answer. If not `None`, the first group of the regex is used as the answer. If there is no group, the whole match is used as the answer. Optional.
`reference_pattern`	String	Default: `None`	The regex pattern to use for parsing the document references. If `None`, no parsing is done, and all documents are references. Optional.
`reference_mode`	Literal	`index` `id` `meta` Default: `index`	The mode for referencing documents. Supported modes are: - `index`: the document references are the one-based index of the document in the list of documents. Example: "this is an answer[1]" references the first document in the list of documents. - `id`: the document references are the document IDs. Example: "this is an answer[123]" references the document with id "123". - `meta`: the document references are the value of a metadata field of the document. Example: "this is an answer[123]" references the document with the value "123" in the metadata field specified by `reference_meta_field`. Required.
`reference_meta_field`	String	Default: `None`	The name of the metadata field to use for document references in `reference_mode`: `meta`. Optional.

Example: This function may be useful if PromptNode is the last node in a pipeline. The output of the PromptNode is a string, while deepset Cloud pipelines expect the Answer object. You may then add a Shaper with the strings_to_answers option at the end of the pipeline after PromptNode.
```
- name: OutputAnswerShaper 
    type: Shaper
    params:
      func: strings_to_answers 
      inputs:
        strings: results # the results PromptNode returns
      outputs:
        - answers
```

strings_to_documents

Changes a list of strings into a list of documents. If you pass the metadata in a single dictionary, all documents get the same metadata. If you pass the metadata as a list, the length of this list must be the same as the length of the list of strings, and each document gets its own metadata. You can specify id_hash_keys only once and it gets assigned to all documents.

Input: List of strings
Output: List of documents

Parameters:

Name	Type	Possible values	Description
`strings`	List of strings		An input parameter. Contains the list of strings to transform into a list of documents. Required.
`meta`	Dictionaries of string and any value	Default: `None`	An input parameter. Specifies the metadata to attach to the resulting list of documents. If you pass a single dictionary, all documents get the metadata from this dictionary. If you pass a list of metadata, each document gets its own metadata, but the list's length must be the same as the length of the list of strings. Optional.
`id_hash_keys`	List of strings	Default: `None`	An input parameter. Generates the document ID from a custom list of strings that refer to the document's attributes. To make sure there are no duplicate documents in your document store if document texts are the same, you can modify the metadata of a document and then pass ["content", "metadata"] to this field to generate IDs based on the document content and the defined metadata. Optional.

Example:

- name: StringsToDocs 
  type: Shaper
  params:
     func: strings_to_documents
     inputs:
       strings:  
         - [string1, string2, string3]
     outputs:
       - documents

value_to_list

Turns a value into a list. The value is repeated in the list to match the length of the list. For example, if you set the list length to five, the value is repeated in this list five times.

Input: Any
Output: List containing the input value as many times as specified.

Parameters:

Name	Type	Possible values	Description
`value`	Any	Any	An input parameter. The name of the value you want to turn into a list. Required.
`target_list`	List	-	An output parameter. Specifies the desired length of the output list in square brackets, for example: `target_list: [9]` Required.

Example: If your PromptTemplate has two parameters: question and documents, and you want the question to be processed against each document, use Shaper with the value_to_list function. It creates a list in which the question is repeated as many times as there are documents. PromptNode then processes each item from each list one by one against each other.
```
- name: QuestionsShaper 
  type: Shaper
  params:
    func: value_to_list 
    inputs:
      value: query
    outputs:
      - questions
      params:
        target_list: [5]
```

After performing a function, Shaper passes the new or modified values further down the pipeline.

Parameters

These are the parameters you can specify for Shaper in pipeline YAML:

Parameter	Type	Possible Values	Description
`func`	String	rename value_to_list join_strings join_documents join_lists strings_to_answers answers_to_strings strings_to_documents documents_to_strings	The function you want to use with Shaper. For more information, see the Functions section. Mandatory.
`outputs`	List of strings		The key to store the outputs of the Shaper's function. The length of `outputs` must match the number of outputs produced by the function you specified for the Shaper. Mandatory.
`inputs`	Dictionary		Maps the function's input keyword arguments to the key-value pairs in the invocation context. For example, the `value_to_list` function expects two inputs: `value` and `taget_list`, so inputs for this function could be: {`value` : `query` , `target_list` : `documents`}. Optional.
`params`	Dictionary		Maps the function's input keyword arguments to fixed values. For example, the `value_to_list` function expects `value` and `target_list` parameters,so `params` might be {`value` : `A`, `target_list` : `[1, 1, 1, 1]`}. The node's output would be: `["A", "A", "A", "A"]`. Optional.
`publish_outputs`	Union of Boolean and List of Strings	Default: `True`	Publishes Shaper's outputs to the pipeline's output. `True` - publishes all outputs. `False` - doesn't publish any output. Mandatory.