LlamaStackChatGenerator
Generate text using models available on Llama Stack server.
Basic Information
- Type:
haystack_integrations.components.generators.llama_stack.chat.chat_generator.LlamaStackChatGenerator - Components it can connect with:
ChatPromptBuilder:LlamaStackChatGeneratorreceives a rendered prompt fromChatPromptBuilder.DeepsetAnswerBuilder:LlamaStackChatGeneratorsends the generated replies toDeepsetAnswerBuilderthroughOutputAdapter.
Inputs
| Parameter | Type | Default | Description |
|---|---|---|---|
| messages | List[ChatMessage] | A list of ChatMessage instances representing the input messages. | |
| streaming_callback | Optional[StreamingCallbackT] | None | A callback function called when the model receives a new token from the stream. |
| generation_kwargs | Optional[Dict[str, Any]] | None | Additional keyword arguments for text generation. These parameters override the parameters in pipeline configuration. |
| tools | Optional[Union[List[Tool], Toolset]] | None | A list of tools or a Toolset for which the model can prepare calls. If set, it overrides the tools parameter set during component initialization. |
| tools_strict | Optional[bool] | None | Whether to enable strict schema adherence for tool calls. |
Outputs
| Parameter | Type | Default | Description |
|---|---|---|---|
| replies | List[ChatMessage] | A list containing the generated responses as ChatMessage instances. |
Overview
Use LlamaStackChatGenerator to generate text with models available on the Llama Stack server. Llama Stack Server supports multiple inference providers, including Ollama, Together AI, vLLM, and other cloud providers.
For a complete list of providers, see the Llama Stack documentation.
You can pass any text generation parameters valid for the OpenAI chat completion API directly to this component using the generation_kwargs parameter.
This component uses the ChatMessage format for structuring both input and output, ensuring coherent and contextually relevant responses in chat-based text generation scenarios.
Prerequisites
To use this chat generator, set up a Llama Stack Server with an inference provider and have a model available. For a quick start on how to set up the server with Ollama, see the Llama Stack documentation.
Usage Example
This is an example RAG pipeline with LlamaStackChatGenerator and DeepsetAnswerBuilder connected through OutputAdapter:
components:
bm25_retriever:
type: haystack_integrations.components.retrievers.opensearch.bm25_retriever.OpenSearchBM25Retriever
init_parameters:
document_store:
type: haystack_integrations.document_stores.opensearch.document_store.OpenSearchDocumentStore
init_parameters:
hosts:
index: 'Standard-Index-English'
max_chunk_bytes: 104857600
embedding_dim: 768
return_embedding: false
method:
mappings:
settings:
create_index: true
http_auth:
use_ssl:
verify_certs:
timeout:
top_k: 20
fuzziness: 0
query_embedder:
type: deepset_cloud_custom_nodes.embedders.nvidia.text_embedder.DeepsetNvidiaTextEmbedder
init_parameters:
normalize_embeddings: true
model: intfloat/e5-base-v2
embedding_retriever:
type: haystack_integrations.components.retrievers.opensearch.embedding_retriever.OpenSearchEmbeddingRetriever
init_parameters:
document_store:
type: haystack_integrations.document_stores.opensearch.document_store.OpenSearchDocumentStore
init_parameters:
hosts:
index: 'Standard-Index-English'
max_chunk_bytes: 104857600
embedding_dim: 768
return_embedding: false
method:
mappings:
settings:
create_index: true
http_auth:
use_ssl:
verify_certs:
timeout:
top_k: 20
document_joiner:
type: haystack.components.joiners.document_joiner.DocumentJoiner
init_parameters:
join_mode: concatenate
ranker:
type: deepset_cloud_custom_nodes.rankers.nvidia.ranker.DeepsetNvidiaRanker
init_parameters:
model: intfloat/simlm-msmarco-reranker
top_k: 8
meta_field_grouping_ranker:
type: haystack.components.rankers.meta_field_grouping_ranker.MetaFieldGroupingRanker
init_parameters:
group_by: file_id
subgroup_by:
sort_docs_by: split_id
answer_builder:
type: deepset_cloud_custom_nodes.augmenters.deepset_answer_builder.DeepsetAnswerBuilder
init_parameters:
reference_pattern: acm
ChatPromptBuilder:
type: haystack.components.builders.chat_prompt_builder.ChatPromptBuilder
init_parameters:
template:
- _content:
- text: "You are a helpful assistant answering the user's questions based on the provided documents.\nDo not use your own knowledge.\n"
_role: system
- _content:
- text: "Provided documents:\n{% for document in documents %}\nDocument [{{ loop.index }}] :\n{{ document.content }}\n{% endfor %}\n\nQuestion: {{ query }}\n"
_role: user
required_variables:
variables:
OutputAdapter:
type: haystack.components.converters.output_adapter.OutputAdapter
init_parameters:
template: '{{ replies[0] }}'
output_type: List[str]
custom_filters:
unsafe: false
LlamaStackChatGenerator:
type: haystack_integrations.components.generators.llama_stack.chat.chat_generator.LlamaStackChatGenerator
init_parameters:
model: ollama/llama3.2:3b
api_base_url: http://localhost:8321/v1/openai/v1
streaming_callback:
generation_kwargs:
tools:
timeout:
max_retries:
http_client_kwargs:
connections:
- sender: bm25_retriever.documents
receiver: document_joiner.documents
- sender: query_embedder.embedding
receiver: embedding_retriever.query_embedding
- sender: embedding_retriever.documents
receiver: document_joiner.documents
- sender: document_joiner.documents
receiver: ranker.documents
- sender: ranker.documents
receiver: meta_field_grouping_ranker.documents
- sender: meta_field_grouping_ranker.documents
receiver: answer_builder.documents
- sender: meta_field_grouping_ranker.documents
receiver: ChatPromptBuilder.documents
- sender: OutputAdapter.output
receiver: answer_builder.replies
- sender: ChatPromptBuilder.prompt
receiver: LlamaStackChatGenerator.messages
- sender: LlamaStackChatGenerator.replies
receiver: OutputAdapter.replies
inputs:
query:
- "bm25_retriever.query"
- "query_embedder.text"
- "ranker.query"
- "answer_builder.query"
- "ChatPromptBuilder.query"
filters:
- "bm25_retriever.filters"
- "embedding_retriever.filters"
outputs:
documents: "meta_field_grouping_ranker.documents"
answers: "answer_builder.answers"
max_runs_per_component: 100
metadata: {}
Parameters
Init Parameters
These are the parameters you can configure in Pipeline Builder:
| Parameter | Type | Default | Description |
|---|---|---|---|
| model | str | The name of the model to use for chat completion. This depends on the inference provider used for the Llama Stack Server. | |
| api_base_url | str | http://localhost:8321/v1/openai/v1 | The Llama Stack API base URL. If not specified, localhost is used with the default port 8321. |
| organization | Optional[str] | None | Your organization ID. |
| streaming_callback | Optional[StreamingCallbackT] | None | A callback function called when a new token is received from the stream. The callback function accepts StreamingChunk as an argument. |
| generation_kwargs | Optional[Dict[str, Any]] | None | Other parameters to use for the model. These parameters are sent directly to the Llama Stack endpoint. See the Llama Stack API documentation for more details. Supported parameters include: max_tokens (maximum number of tokens the output text can have), temperature (sampling temperature for creativity control), top_p (nucleus sampling probability mass), stream (whether to stream back partial progress), safe_prompt (whether to inject a safety prompt before all conversations), random_seed (the seed to use for random sampling), response_format (a JSON schema or Pydantic model that enforces the structure of the model's response). |
| timeout | Optional[int] | None | Timeout for client calls. If not set, it defaults to either the OPENAI_TIMEOUT environment variable or 30 seconds. |
| tools | Optional[Union[List[Tool], Toolset]] | None | A list of tools or a Toolset for which the model can prepare calls. Each tool should have a unique name. |
| tools_strict | bool | False | Whether to enable strict schema adherence for tool calls. If set to True, the model follows exactly the schema provided in the parameters field of the tool definition, but this may increase latency. |
| max_retries | Optional[int] | None | Maximum number of retries to contact the server after an internal error. If not set, it defaults to either the OPENAI_MAX_RETRIES environment variable or five. |
| http_client_kwargs | Optional[Dict[str, Any]] | None | A dictionary of keyword arguments to configure a custom httpx.Client or httpx.AsyncClient. For more information, see the HTTPX documentation. |
Run Method Parameters
These are the parameters you can configure for the component's run() method. You can pass these parameters at query time through the API, in Playground, or when running a job.
| Parameter | Type | Default | Description |
|---|---|---|---|
| messages | List[ChatMessage] | A list of ChatMessage instances representing the input messages. | |
| streaming_callback | Optional[StreamingCallbackT] | None | A callback function called when a new token is received from the stream. |
| generation_kwargs | Optional[Dict[str, Any]] | None | Additional keyword arguments for text generation. These parameters override the parameters in pipeline configuration. |
| tools | Optional[Union[List[Tool], Toolset]] | None | A list of tools or a Toolset for which the model can prepare calls. If set, it overrides the tools parameter set during component initialization. |
| tools_strict | Optional[bool] | None | Whether to enable strict schema adherence for tool calls. |
Was this page helpful?