Generators Overview

Generators let you use large language models (LLMs) in your applications. Each model provider has a dedicated Generator. There are also ChatGenerators designed for chat-based interactions. Learn when to use each type and how they differ.

Choosing a Generator

Each model provider or model-hosting platform supported by deepset has a dedicated Generator. Choose the the one that works with the model provider you want to use. For example, to use the Claude model through Anthropic's API, choose AnthropicGenerator. To use models through Amazon Bedrock, choose AmazonBedrockGenerator.

deepset AI Platform also provides DeepsetAmazonBedrockGenerator and DeepsetAmazonBedrockVisionGenerator that let you use models hosted on Amazon Bedrock through deepset's account.

For guidance on models, see Large Language Models Overview and Language Models in deepset AI Platform.

Generators and ChatGenerators

Most Generators have a corresponding ChatGenerator.

  • Generators are designed for text-generation tasks, such as in a retrieval augmented generation (RAG) system, where the user asks a question and receives a one-time answer.
  • ChatGenerators handle multi-turn conversations, maintaining context and consistency throughout the interaction. They also support tool calling, which allows the model to make calls to external tools or functions.

Key Differences

Generators ChatGenerators
Input type String List of ChatMessage objects
Output type Text ChatMessage
Best for
  • Single-turn text generation
  • RAG-style Q&A
  • Multi-turn chat scenarios
  • Maintaining context across interactions
  • Assuming a consistent role
Tool calling Not supported Supported (accepts tools and functions as parameters)
Used with PromptBuilder ChatPromptBuilder

When to Use Each

  • Use a ChatGenerator if:
    • Your application involves multi-turn conversations.
    • The model needs to call external tools or functions
  • Use a Generator if the model only needs to generate answers without maintaining conversation history.

Streaming

Streaming refers to the process of generating responses in real time as the model processes input. Instead of waiting for the entire input to be processed before responding, the model generates an answer token by token, making the communication feel more fluid and immediate.

All RAG pipelines in deepset AI Platform have streaming enabled by default.

To read more about streaming and how to enable it, see Streaming.

ChatMessage

ChatMessage is a data class in Haystack used by ChatGenerators and ChatPromptBuilder. ChatPromptBuilder sends a list of ChatMessages to a ChatGenerator, which then also returns a list of ChatMessages.

Each message has a role (such as system, user, assistant, or tool) and associated content. The system message is used to set the overall tone and instructions for the conversation, for example: "You are a helpful assistant." The user message is the input from the user, usually a query, but it can also include documents to pass to the model. During the interaction, the LLM generates the next message in the conversation, usually as an assistant.

For details on the message format and its properties, see ChatMessage in Haystack documentation.

When using ChatPromptBuilder always provide your instructions using the template parameter in the following format:

- _content: 
	- content_type: content # supported content types are: text, tool_call, tool_call_result
  # content may contain variables
  _role: role # supported roles are: user, system, assistant, tool

In most cases, you'll write your instructions using the text content type and roles such as user and system. For instance, you might include the model's instructions as a system role ChatMessage, and the user's input as a user role ChatMessage. In the following example, we include retrieved documents within the user message together with the query:

- _content:
    - text: |
        You are a helpful assistant answering the user's questions.
        If the answer is not in the documents, rely on the web_search tool to find information.
        Do not use your own knowledge.
  _role: system
- _content:
    - text: |
        Question: {{ query }}
  _role: user

If the LLM calls a tool, it outputs a message with the content type tool_call. You can use this content type to configure conditional pipeline paths. For example, you can create two routes with ConditionalRouter: one route for when the LLM makes a tool call, and another for when it doesn't.

Generators in a Pipeline

  • Generators receive the prompt from PromptBuilder and return a list of strings. They can easily connect to any component that accepts a list of strings as input.
  • ChatGenerators receive prompts from ChatPromptBuilder in the form of a list of ChatMessage objects and they return a list of ChatMessage objects.
    • If a ChatGenerator produces a tool call, you can send it to a ToolInvoker which executes the tool.
    • If you want a ChatGenerator's output to be the final pipeline output, you can either connect it directly to AnswerBuilder or use an OutputAdapter to convert the ChatGenerator's output into a list of strings and then connect it to DeepsetAnswerBuilder.

For details, see Common Component Combinations.

Limitations

ChatGenerators don't work in Prompt Explorer. To experiment with prompts, you can use the Configurations feature in the Playground and modify the ChatPromptBuilder's template parameter. For details, see Modify Pipeline Parameters at Query Time.