Shaper
Shaper is most often used with PromptNode to ensure the input the PromptNode receives and the output it generates match the expected format.
Shaper can modify the output or input of nodes. It comes with ready-to-use functions that act on values, renaming them or changing their type, for example from a list to a string. You can choose the function you want to use when configuring the node in your pipeline. Shaper functions are handy when you want to use PromptNode in a pipeline, and you need it to generate a specific output.
Basic Information
- Pipeline Type: Used in query pipelines with a PromptNode
- Nodes that can precede it in a pipeline: PromptNode
- Nodes that can follow it in a pipeline: PromptNode, Retriever
It can also be used in between two PromptNodes. - Input/output: Differs depending on the function used. Check the function documentation below.
- Available Classes: Shaper
Usage
When adding Shaper to your pipeline, specify the function you want it to use in Shaper's parameters. In this example, Shaper renames the value query
into question
. The resulting value quesiton
is then passed down the pipeline:
- name: shaper
type: Shaper
params:
func: rename
inputs:
value: query
outputs: [question]
For more information about functions, see the Functions section.
Shaper and PromptNode
When used with PromptNode, Shaper acts as a PromptNode helper. Let's recall how PromptNode works:
- PromptNode uses PromptTemplate containing the prompt, or instruction, for the large language model.
- PromptTemplate contains variables that are substituted with real values when PromptNode runs.
In a pipeline, PromptNode receives these variables from the preceding node. It may happen that the variable names or shapes the PromptTemplate expects differ from the ones the PromptNode receives. That's when Shaper comes in and resolves this issue.
You can also use Shaper in reverse situations. If the output of a PromptNode differs from the format the next node in the pipeline expects, Shaper can change it.
See also PromptNode documentation.
Example
Let's see how to use Shaper between PromptNode and a Retriever. This example is a RAG pipeline with an additional PromptNode that acts as the query spell checker. Here's how it works:
- The spell-checking PromptNode takes in the query and corrects it.
- The corrected query is sent to the Retriever, which fetches relevant documents from the document store.
- The Ranker then takes these documents, ranks them, and sends them to the answer generator PromptNode.
There's one problem with this flow: the output of the spell-checking PromptNode is incompatible with the input of the Retriever. In this case, PromptNode's PromptTemplate is not using any output_parser, so the PromptNode's output is a string under the key called "results," while the Retriever needs a query as input. We can easily fix this by putting a Shaper with the join_strings
function between the spell-checking PromptNode and the Retriever:
components:
- name: DocumentStore
type: DeepsetCloudDocumentStore # The only supported document store in deepset Cloud
params:
embedding_dim: 768
similarity: cosine
- name: query_spell_check
type: PromptTemplate
params:
prompt: >
You are a spelling correction system.
{new_line}You receive a question and correct it.
{new_line}Output only the corrected question
{new_line}Question: {query}
{new_line}Corrected Question:
- name: SpellCheckPromptNode
type: PromptNode
params:
default_prompt_template: query_spell_check
max_length: 650 # The maximum number of tokens the generated answer can have
model_kwargs: # Specifies additional model settings
temperature: 0 # Lower temperature works best for fact-based results
model_name_or_path: gpt-3.5-turbo
- name: StringToQuery # Converts the output from SpellCheckPromptNode into a single query string, which is the input type the retriever expects.
type: Shaper
params:
func: join_strings
inputs:
strings: results #The default output from PromptNode
outputs:
- query #The input the Retriever expects
- name: EmbeddingRetriever # Selects the most relevant documents from the document store
type: EmbeddingRetriever # Uses a Transformer model to encode the document and the query
params:
document_store: DocumentStore
embedding_model: intfloat/e5-base-v2 # Model optimized for semantic search. It has been trained on 215M (question, answer) pairs from diverse sources.
model_format: sentence_transformers
top_k: 20 # The number of results to return
- name: Reranker # Uses a cross-encoder model to rerank the documents returned by the two retrievers
type: CNSentenceTransformersRanker
params:
model_name_or_path: intfloat/simlm-msmarco-reranker # Fast model optimized for reranking
top_k: 4 # The number of results to return
batch_size: 20 # Try to keep this number equal or larger to the sum of the top_k of the two retrievers so all docs are processed at once
model_kwargs: # Additional keyword arguments for the model
torch_dtype: torch.float16
- name: qa_template
type: PromptTemplate
params:
output_parser:
type: AnswerParser
prompt: >
You are a technical expert.
{new_line}You answer questions truthfully based on provided documents.
{new_line}For each document check whether it is related to the question.
{new_line}Only use documents that are related to the question to answer it.
{new_line}Ignore documents that are not related to the question.
{new_line}If the answer exists in several documents, summarize them.
{new_line}Only answer based on the documents provided. Don't make things up.
{new_line}Always use references in the form [NUMBER OF DOCUMENT] when using information from a document. e.g. [3], for Document[3].
{new_line}The reference must only refer to the number that comes in square brackets after passage.
{new_line}Otherwise, do not use brackets in your answer and reference ONLY the number of the passage without mentioning the word passage.
{new_line}If the documents can't answer the question or you are unsure say: 'The answer can't be found in the text'.
{new_line}These are the documents:
{join(documents, delimiter=new_line, pattern=new_line+'Document[$idx]:'+new_line+'$content')}
{new_line}Question: {query}
{new_line}Answer:
- name: PromptNode
type: PromptNode
params:
default_prompt_template: qa_template
max_length: 400 # The maximum number of tokens the generated answer can have
model_kwargs: # Specifies additional model settings
temperature: 0 # Lower temperature works best for fact-based qa
model_name_or_path: gpt-3.5-turbo
- name: FileTypeClassifier # Routes files based on their extension to appropriate converters, by default txt, pdf, md, docx, html
type: FileTypeClassifier
- name: TextConverter # Converts files into documents
type: TextConverter
- name: PDFConverter # Converts PDFs into documents
type: PDFToTextConverter
- name: Preprocessor # Splits documents into smaller ones and cleans them up
type: PreProcessor
params:
# With a vector-based retriever, it's good to split your documents into smaller ones
split_by: word # The unit by which you want to split the documents
split_length: 250 # The max number of words in a document
split_overlap: 20 # Enables the sliding window approach
language: en
split_respect_sentence_boundary: True # Retains complete sentences in split documents
pipelines:
- name: query
nodes:
- name: SpellCheckPromptNode
inputs: [Query]
- name: ListToString
inputs: [SpellCheckPromptNode]
- name: EmbeddingRetriever
inputs: [ListToString]
- name: Reranker
inputs: [EmbeddingRetriever]
- name: PromptNode
inputs: [Reranker]
- name: indexing
nodes:
# Depending on the file type, we use a Text or PDF converter
- name: FileTypeClassifier
inputs: [File]
- name: TextConverter
inputs: [FileTypeClassifier.output_1] # Ensures that this converter receives txt files
- name: PDFConverter
inputs: [FileTypeClassifier.output_2] # Ensures that this converter receives PDFs
- name: Preprocessor
inputs: [TextConverter, PDFConverter]
- name: EmbeddingRetriever
inputs: [Preprocessor]
- name: DocumentStore
inputs: [EmbeddingRetriever]
Have a look at the Functions section to understand what functions are available.
Shaper Functions
Functions follow this format:
- name: shaper
type: Shaper
params:
func: function_name
inputs:
<input>: <name of the input key> # this is the name of the output key of the preceding node
<param_name>: <param_value> # not all functions have input parameters
outputs: <name of the output key> # this is the name of the key you want to use for the output
# make sure this key name is compatible with the input of the subsequent node in the pipeline
<param_name>: <param_value> # not all functions have output parameters
You can check a node's input and output keys in the node's documentation. Shaper is often used in between PromptNodes, so then the input and output keys depend on the task the PromptNode performs.
These are the functions you can use with Shaper:
answers_to_strings
Extracts the content field of Answers and returns a list of strings.
-
Input: List of answer objects
-
Output: List of strings
-
Parameters:
Name Type Possible values Description answers
List of answers The answer key returned by the preceding node. An input argument. Specifies the answers you want to turn into a list of strings.
Required.pattern
String Default: None
An input argument. Specifies the regex pattern used for parsing the answer. You can use the following placeholders:
-$id
: The ID of the answer
-$META_FIELD
: The value of the metadata field calledMETA_FIELD
.
IfNone
, the whole string is used as the answer. If notNone
, the first group of the regex is used as the answer. If there is no group, the whole match is used as the answer.
Optional.str_replace
Dictionary of strings Default: None
An input argument. Specifies the character or string you want to replace in the output string.
Optional. -
Example:
- name: AnswerShaper type: Shaper params: func: answers_to_strings inputs: answers: results str_replace: r: R outputs: - documents
current_datetime
Returns the current time and date in the format you specify.
-
Input: String
-
Output: The current date and time as a string.
-
Parameters:
Name Type Possible values Description format
String %H:%M:%S %d/%m/%y
Default:%H:%M:%S %d/%m/%y
An input parameter. Sets the format of the date and time. Use the following symbols to indicate how you want to display the date and time:
-%d
- day
-%m
- month
-%y
- year
-%H
- hour
-%M
- minute
-%S
- second
Required.
Examples:
This example returns the current time and date in the format DD MM YYYYY HH:MM:SS:
- name: shaper
type: Shaper
params:
func: current_datetime
inputs:
format: %H:%M:%S %d/%m/%y
outputs: [string]
The output of this function would look like: 01.01.2023 12:30:10.
This example returns the current time only:
- name: shaper
type: Shaper
params:
func: current_datetime
inputs:
format: %d/%m/%y
outputs: [string]
documents_to_strings
Extracts the content
field of each document you pass to it and puts it in a list of strings. Each item in this list is the content of the content
field of one document.
-
Input: String (a single document) or a list of strings (a list of documents)
-
Output: List of strings
-
Parameters:
Name Type Possible values Description documents
List of documents An input parameter. Specifies the list of documents you want to transform into a list of strings.
Required.pattern
String Default: None
An input parameter. Contains the regex pattern used for parsing the documents. You can use the following placeholders:
-$content
: The content of the document
-$idx
: The index of the document in the list
-$id
: The ID of the document
-$META_FIELD
: The value of the metadata field calledMETA_FIELD
.
IfNone
, no parsing is done, and all documents are referenced.
Optional.str_replace
Dictionary of strings Default: None
An input parameter. The character or string you want to replace in the output string.
Optional. -
Example:
- name: DocsToStrings type: Shaper params: func: documents_to_strings inputs: documents: - documents outputs: - string
join_documents
Takes a list of documents and changes it into a list containing a single document. The new list contains all the original documents separated by the specified delimiter. All metadata is dropped.
-
Input: List of documents
-
Ouput: List containing a single document
-
Parameters:
Name Type Possible values Description documents
List List of documents An input parameter. Specifies the list of documents you want to change into a list containing a single document.
Required.delimiter
String The symbol you want to use as a delimiter
Default:" "
(space)An input parameter. The character or symbol you want to use to divide the lists.
Required.pattern
String Default: None
An input parameter. Specifies the parsing of the documents in the output list. Use regex to define that. You can use the following placeholders:
-$content
: The content of the document
-$idx
: The index of the document in the list
-$id
: The ID of the document
-$META_FIELD
: The value of the metadata field calledMETA_FIELD
.
IfNone
, no parsing is done.
Optional.str_replace
Dictionary of strings string_to_replace
:new_string
Default:None
An input parameter. The character or string you want to replace in the final list.
Optional. -
Example: If you have a pipeline with PromptNode and a PromptTemplate with two parameters, for example,
question
anddocuments
. To make sure PromptNode runs the question against all documents, you can merge the documents into one:- name: joinDocs type: Shaper params: func: join_documents inputs: - documents outputs: - documents
join_documents_and_scores
Transforms a list of documents with scores in their metadata into a list containing a single document.
The resulting document contains the scores and the contents of all the original documents. All metadata is dropped.
-
Input: A list of documents
-
Output: A list containing a single document
-
Parameters:
Name Type Possible values Description documents
List List of documents An input parameter. A list of documents with scores that you want to transform into a single document.
Required. -
Example:
- name: joinDocsAndScores type: Shaper params: func: join_documents_and_scores inputs: - documents outputs: - documents
join_lists
Joins multiple lists into a single list.
-
Input: List of lists
-
Output: List
-
Parameters:
Name Type Possible values Description lists
List Lists An input parameter. The lists you want to merge.
Required. -
Example:
- name: joinLists type: Shaper params: func: join_lists inputs: - list1 - list2 outputs: - list
join_strings
Takes a list of strings and changes it into a single string. The string contains all the original strings separated by the specified delimiter.
-
Input: List of strings
-
Output: String
-
Parameters:
Name Type Possible values Description strings
List of strings Names of lists of strings An input parameter. Contains the names of the lists of strings you want to merge into a single string.
Required.delimiter
String The symbol you want to use as a delimiter
Default:" "
(space)An input parameter. Specifies the character or symbol you want to use to divide the lists.
Required.str_replace
Dictionary of strings string_to_replace
:new_string
Default:None
An input parameter. Specifies the character or string you want to replace in the final list.
Optional. -
Example:
- name: JoinStrings type: Shaper params: func: join_strings inputs: strings: - first - second - third delimiter: "-" str_replace: r: R outputs: - string # The expected output of this function is: "fiRst-second-thirRd"
rename
Renames a value without changing it.
-
Input: Any type
-
Output: The same type as input but renamed
-
Parameters:
Name Type Possible values Description value
Any Any An input parameter. Specifies the name of the value to be renamed.
Required. -
Example: This example renames
query
toquestion
.- name: shaper type: Shaper params: func: rename inputs: value: query outputs: [question]
strings_to_answers
Transforms a list of strings into a list of answer objects.
-
Input: List of strings
-
Output: List of answer objects
-
Parameters:
Name Type Possible values Description strings
List of strings An input parameter. Specifies a list of strings you want to turn into a list of answers.
Required.prompts
String Default: None
The prompts used to generate the answers
Optional.documents
List of documents Default: None
The documents based on which the answer is generated.
Optional.pattern
String Default: None
The regex pattern used for parsing the answer. You can use the following placeholders:
-$content
: The content of the document
-$idx
: The index of the document in the list
-$id
: The ID of the document
-$META_FIELD
: The value of the metadata field calledMETA_FIELD
.
IfNone
, the whole string is used as the answer. If notNone
, the first group of the regex is used as the answer. If there is no group, the whole match is used as the answer.
Optional.reference_pattern
String Default: None
The regex pattern to use for parsing the document references.
IfNone
, no parsing is done, and all documents are references.
Optional.reference_mode
Literal index
id
meta
Default:index
The mode for referencing documents. Supported modes are:
-index
: the document references are the one-based index of the document in the list of documents.
Example: "this is an answer[1]" references the first document in the list of documents.
-id
: the document references are the document IDs.
Example: "this is an answer[123]" references the document with id "123".
-meta
: the document references are the value of a metadata field of the document.
Example: "this is an answer[123]" references the document with the value "123" in the metadata field specified byreference_meta_field
.
Required.reference_meta_field
String Default: None
The name of the metadata field to use for document references in reference_mode
:meta
.
Optional. -
Example: This function may be useful if PromptNode is the last node in a pipeline. The output of the PromptNode is a string, while deepset Cloud pipelines expect the Answer object. You may then add a Shaper with the strings_to_answers option at the end of the pipeline after PromptNode.
- name: OutputAnswerShaper type: Shaper params: func: strings_to_answers inputs: strings: results # the results PromptNode returns outputs: - answers
strings_to_documents
Changes a list of strings into a list of documents. If you pass the metadata in a single dictionary, all documents get the same metadata. If you pass the metadata as a list, the length of this list must be the same as the length of the list of strings, and each document gets its own metadata. You can specify id_hash_keys
only once and it gets assigned to all documents.
-
Input: List of strings
-
Output: List of documents
-
Parameters:
Name Type Possible values Description strings
List of strings An input parameter. Contains the list of strings to transform into a list of documents.
Required.meta
Dictionaries of string and any value Default: None
An input parameter. Specifies the metadata to attach to the resulting list of documents. If you pass a single dictionary, all documents get the metadata from this dictionary. If you pass a list of metadata, each document gets its own metadata, but the list's length must be the same as the length of the list of strings.
Optional.id_hash_keys
List of strings Default: None
An input parameter. Generates the document ID from a custom list of strings that refer to the document's attributes. To make sure there are no duplicate documents in your document store if document texts are the same, you can modify the metadata of a document and then pass ["content", "metadata"] to this field to generate IDs based on the document content and the defined metadata.
Optional. -
Example:
- name: StringsToDocs type: Shaper params: func: strings_to_documents inputs: strings: - [string1, string2, string3] outputs: - documents
value_to_list
Turns a value into a list. The value is repeated in the list to match the length of the list. For example, if you set the list length to five, the value is repeated in this list five times.
-
Input: Any
-
Output: List containing the input value as many times as specified.
-
Parameters:
Name Type Possible values Description value
Any Any An input parameter. The name of the value you want to turn into a list. Required. target_list
List - An output parameter. Specifies the desired length of the output list in square brackets, for example: target_list: [9]
Required. -
Example: If your PromptTemplate has two parameters:
question
anddocuments
, and you want the question to be processed against each document, use Shaper with thevalue_to_list
function. It creates a list in which the question is repeated as many times as there are documents. PromptNode then processes each item from each list one by one against each other.- name: QuestionsShaper type: Shaper params: func: value_to_list inputs: value: query outputs: - questions params: target_list: [5]
After performing a function, Shaper passes the new or modified values further down the pipeline.
Parameters
These are the parameters you can specify for Shaper in pipeline YAML:
Parameter | Type | Possible Values | Description |
---|---|---|---|
func | String | rename value_to_list join_strings join_documents join_lists strings_to_answers answers_to_strings strings_to_documents documents_to_strings | The function you want to use with Shaper. For more information, see the Functions section. Mandatory. |
outputs | List of strings | The key to store the outputs of the Shaper's function. The length of outputs must match the number of outputs produced by the function you specified for the Shaper.Mandatory. | |
inputs | Dictionary | Maps the function's input keyword arguments to the key-value pairs in the invocation context. For example, the value_to_list function expects two inputs: value and taget_list , so inputs for this function could be: {value : query , target_list : documents }.Optional. | |
params | Dictionary | Maps the function's input keyword arguments to fixed values. For example, the value_to_list function expects value and target_list parameters,so params might be {value : A , target_list : [1, 1, 1, 1] }. The node's output would be: ["A", "A", "A", "A"] .Optional. | |
publish_outputs | Union of Boolean and List of Strings | Default: True | Publishes Shaper's outputs to the pipeline's output.True - publishes all outputs.False - doesn't publish any output.Mandatory. |
Updated 7 months ago