Skip to main content
For the complete documentation index for agents and LLMs, see llms.txt.

LlamaStackChatGenerator

Generate text using models available on Llama Stack server.

Key Features

  • Connects to a Llama Stack Server that supports multiple inference providers including Ollama, Together AI, vLLM, and other cloud providers.
  • Accepts ChatMessage objects as input and returns generated replies as ChatMessage objects.
  • Supports streaming responses via a callback function.
  • Supports tool calls for agentic workflows.
  • Compatible with any text generation parameters valid for the OpenAI chat completion API.

Configuration

Before using this component, set up a Llama Stack Server with an inference provider and make sure a model is available. For a quick start, see the Llama Stack documentation.

  1. Drag the LlamaStackChatGenerator component onto the canvas from the Component Library.
  2. Click on the component to open the configuration panel.
  3. On the General tab:
    • Set the model parameter to the name of the model available on your Llama Stack Server inference provider.
    • Set api_base_url to your Llama Stack API base URL. The default is http://localhost:8321/v1/openai/v1.
  4. Go to the Advanced tab to configure timeout, max_retries, generation_kwargs, and http_client_kwargs.

Connections

LlamaStackChatGenerator accepts a list of ChatMessage objects as input. Connect its messages input to the prompt output of ChatPromptBuilder.

It outputs replies as a list of ChatMessage objects. Connect its replies output through OutputAdapter to DeepsetAnswerBuilder.

Source Code

To check this component's source code, open chat_generator.py in the Haystack Core Integrations repository.

Usage Examples

Basic Configuration

  LlamaStackChatGenerator:
type: haystack_integrations.components.generators.llama_stack.chat.chat_generator.LlamaStackChatGenerator
init_parameters:
model: ollama/llama3.2:3b
api_base_url: http://localhost:8321/v1/openai/v1

This is an example RAG pipeline with LlamaStackChatGenerator and DeepsetAnswerBuilder connected through OutputAdapter:

components:
bm25_retriever:
type: haystack_integrations.components.retrievers.opensearch.bm25_retriever.OpenSearchBM25Retriever
init_parameters:
document_store:
type: haystack_integrations.document_stores.opensearch.document_store.OpenSearchDocumentStore
init_parameters:
hosts:
index: 'Standard-Index-English'
max_chunk_bytes: 104857600
embedding_dim: 768
return_embedding: false
method:
mappings:
settings:
create_index: true
http_auth:
use_ssl:
verify_certs:
timeout:
top_k: 20
fuzziness: 0

query_embedder:
type: deepset_cloud_custom_nodes.embedders.nvidia.text_embedder.DeepsetNvidiaTextEmbedder
init_parameters:
normalize_embeddings: true
model: intfloat/e5-base-v2

embedding_retriever:
type: haystack_integrations.components.retrievers.opensearch.embedding_retriever.OpenSearchEmbeddingRetriever
init_parameters:
document_store:
type: haystack_integrations.document_stores.opensearch.document_store.OpenSearchDocumentStore
init_parameters:
hosts:
index: 'Standard-Index-English'
max_chunk_bytes: 104857600
embedding_dim: 768
return_embedding: false
method:
mappings:
settings:
create_index: true
http_auth:
use_ssl:
verify_certs:
timeout:
top_k: 20

document_joiner:
type: haystack.components.joiners.document_joiner.DocumentJoiner
init_parameters:
join_mode: concatenate

ranker:
type: deepset_cloud_custom_nodes.rankers.nvidia.ranker.DeepsetNvidiaRanker
init_parameters:
model: intfloat/simlm-msmarco-reranker
top_k: 8

meta_field_grouping_ranker:
type: haystack.components.rankers.meta_field_grouping_ranker.MetaFieldGroupingRanker
init_parameters:
group_by: file_id
subgroup_by:
sort_docs_by: split_id

answer_builder:
type: deepset_cloud_custom_nodes.augmenters.deepset_answer_builder.DeepsetAnswerBuilder
init_parameters:
reference_pattern: acm

ChatPromptBuilder:
type: haystack.components.builders.chat_prompt_builder.ChatPromptBuilder
init_parameters:
template:
- _content:
- text: "You are a helpful assistant answering the user's questions based on the provided documents.\nDo not use your own knowledge.\n"
_role: system
- _content:
- text: "Provided documents:\n{% for document in documents %}\nDocument [{{ loop.index }}] :\n{{ document.content }}\n{% endfor %}\n\nQuestion: {{ query }}\n"
_role: user
required_variables:
variables:

OutputAdapter:
type: haystack.components.converters.output_adapter.OutputAdapter
init_parameters:
template: '{{ replies[0] }}'
output_type: List[str]
custom_filters:
unsafe: false

LlamaStackChatGenerator:
type: haystack_integrations.components.generators.llama_stack.chat.chat_generator.LlamaStackChatGenerator
init_parameters:
model: ollama/llama3.2:3b
api_base_url: http://localhost:8321/v1/openai/v1
streaming_callback:
generation_kwargs:
tools:
timeout:
max_retries:
http_client_kwargs:

connections:
- sender: bm25_retriever.documents
receiver: document_joiner.documents
- sender: query_embedder.embedding
receiver: embedding_retriever.query_embedding
- sender: embedding_retriever.documents
receiver: document_joiner.documents
- sender: document_joiner.documents
receiver: ranker.documents
- sender: ranker.documents
receiver: meta_field_grouping_ranker.documents
- sender: meta_field_grouping_ranker.documents
receiver: answer_builder.documents
- sender: meta_field_grouping_ranker.documents
receiver: ChatPromptBuilder.documents
- sender: OutputAdapter.output
receiver: answer_builder.replies
- sender: ChatPromptBuilder.prompt
receiver: LlamaStackChatGenerator.messages
- sender: LlamaStackChatGenerator.replies
receiver: OutputAdapter.replies

inputs:
query:
- "bm25_retriever.query"
- "query_embedder.text"
- "ranker.query"
- "answer_builder.query"
- "ChatPromptBuilder.query"
filters:
- "bm25_retriever.filters"
- "embedding_retriever.filters"

outputs:
documents: "meta_field_grouping_ranker.documents"
answers: "answer_builder.answers"

max_runs_per_component: 100

metadata: {}

Parameters

Inputs

ParameterTypeDefaultDescription
messagesList[ChatMessage]A list of ChatMessage instances representing the input messages.
streaming_callbackOptional[StreamingCallbackT]NoneA callback function called when the model receives a new token from the stream.
generation_kwargsOptional[Dict[str, Any]]NoneAdditional keyword arguments for text generation. These parameters override the parameters in pipeline configuration.
toolsOptional[Union[List[Tool], Toolset]]NoneA list of tools or a Toolset for which the model can prepare calls. If set, it overrides the tools parameter set during component initialization.
tools_strictOptional[bool]NoneWhether to enable strict schema adherence for tool calls.

Outputs

ParameterTypeDefaultDescription
repliesList[ChatMessage]A list containing the generated responses as ChatMessage instances.

Init Parameters

These are the parameters you can configure in Pipeline Builder:

ParameterTypeDefaultDescription
modelstrThe name of the model to use for chat completion. This depends on the inference provider used for the Llama Stack Server.
api_base_urlstrhttp://localhost:8321/v1/openai/v1The Llama Stack API base URL. If not specified, localhost is used with the default port 8321.
organizationOptional[str]NoneYour organization ID.
streaming_callbackOptional[StreamingCallbackT]NoneA callback function called when a new token is received from the stream. The callback function accepts StreamingChunk as an argument.
generation_kwargsOptional[Dict[str, Any]]NoneOther parameters to use for the model. These parameters are sent directly to the Llama Stack endpoint. See the Llama Stack API documentation for more details. Supported parameters include: max_tokens, temperature, top_p, stream, safe_prompt, random_seed, response_format.
timeoutOptional[int]NoneTimeout for client calls. If not set, it defaults to either the OPENAI_TIMEOUT environment variable or 30 seconds.
toolsOptional[Union[List[Tool], Toolset]]NoneA list of tools or a Toolset for which the model can prepare calls. Each tool should have a unique name.
tools_strictboolFalseWhether to enable strict schema adherence for tool calls. If set to True, the model follows exactly the schema provided in the parameters field of the tool definition, but this may increase latency.
max_retriesOptional[int]NoneMaximum number of retries to contact the server after an internal error. If not set, it defaults to either the OPENAI_MAX_RETRIES environment variable or five.
http_client_kwargsOptional[Dict[str, Any]]NoneA dictionary of keyword arguments to configure a custom httpx.Client or httpx.AsyncClient. For more information, see the HTTPX documentation.

Run Method Parameters

These are the parameters you can configure for the component's run() method. This means you can pass these parameters at query time through the API, in Playground, or when running a job. For details, see Modify Pipeline Parameters at Query Time.

ParameterTypeDefaultDescription
messagesList[ChatMessage]A list of ChatMessage instances representing the input messages.
streaming_callbackOptional[StreamingCallbackT]NoneA callback function called when a new token is received from the stream.
generation_kwargsOptional[Dict[str, Any]]NoneAdditional keyword arguments for text generation. These parameters override the parameters in pipeline configuration.
toolsOptional[Union[List[Tool], Toolset]]NoneA list of tools or a Toolset for which the model can prepare calls. If set, it overrides the tools parameter set during component initialization.
tools_strictOptional[bool]NoneWhether to enable strict schema adherence for tool calls.