Skip to main content
For the complete documentation index for agents and LLMs, see llms.txt.

HuggingFaceLocalGenerator

Generate text using models from Hugging Face that run locally.

text2text-generation deprecated

The text2text-generation task is deprecated and may be removed in a future release. In transformers v5+, text2text-generation is no longer available as a valid pipeline task. If you currently use task="text2text-generation", replace it with task="text-generation" and ensure the selected model is compatible. To use the older behavior, pin transformers<5.

Key Features

  • Runs LLMs locally using Hugging Face's transformers pipeline — no external API required.
  • Supports decoder models like GPT and Qwen with the text-generation task.
  • Configurable generation parameters such as max_new_tokens, temperature, and top_p.
  • Supports streaming responses and configurable stop words.
  • Automatically selects the best available device or accepts a custom device setting.

Configuration

Authentication

For remote models, this component uses the HF_API_TOKEN environment variable. Connect deepset AI Platform with Hugging Face first. For details, see Use Hugging Face Models.

  1. Drag the HuggingFaceLocalGenerator component onto the canvas from the Component Library.
  2. Click the component to open the configuration panel.
  3. On the General tab:
    1. Enter the Hugging Face model name or local path, such as Qwen/Qwen3-0.6B.
  4. Go to the Advanced tab to configure the task, device, token, generation kwargs, Hugging Face pipeline kwargs, stop words, and streaming callback.

Connections

HuggingFaceLocalGenerator receives a prompt string — typically from PromptBuilder. It outputs replies (a list of generated strings). Connect its replies output to AnswerBuilder for formatting answers.

Usage Example

This query pipeline uses HuggingFaceLocalGenerator for local text generation:

components:
bm25_retriever:
type: haystack_integrations.components.retrievers.opensearch.bm25_retriever.OpenSearchBM25Retriever
init_parameters:
document_store:
type: haystack_integrations.document_stores.opensearch.document_store.OpenSearchDocumentStore
init_parameters:
hosts:
index: 'default'
max_chunk_bytes: 104857600
embedding_dim: 768
return_embedding: false
method:
mappings:
settings:
create_index: true
http_auth:
use_ssl:
verify_certs:
timeout:
top_k: 10
fuzziness: 0

PromptBuilder:
type: haystack.components.builders.prompt_builder.PromptBuilder
init_parameters:
template: |
Given the following information, answer the question.

Context:
{% for document in documents %}
{{ document.content }}
{% endfor %}

Question: {{ query }}
required_variables:
variables:

HuggingFaceLocalGenerator:
type: haystack.components.generators.hugging_face_local.HuggingFaceLocalGenerator
init_parameters:
model: Qwen/Qwen3-0.6B
task: text-generation
device:
token:
type: env_var
env_vars:
- HF_API_TOKEN
- HF_TOKEN
strict: false
generation_kwargs:
max_new_tokens: 100
temperature: 0.9
huggingface_pipeline_kwargs:
stop_words:
streaming_callback:

AnswerBuilder:
type: haystack.components.builders.answer_builder.AnswerBuilder
init_parameters:
pattern:
reference_pattern:

connections:
- sender: bm25_retriever.documents
receiver: PromptBuilder.documents
- sender: PromptBuilder.prompt
receiver: HuggingFaceLocalGenerator.prompt
- sender: HuggingFaceLocalGenerator.replies
receiver: AnswerBuilder.replies
- sender: bm25_retriever.documents
receiver: AnswerBuilder.documents

inputs:
query:
- bm25_retriever.query
- PromptBuilder.query
- AnswerBuilder.query

outputs:
answers: AnswerBuilder.answers

max_runs_per_component: 100

metadata: {}

Parameters

Inputs

ParameterTypeDefaultDescription
promptstrA string representing the prompt.
streaming_callbackOptional[StreamingCallbackT]NoneA callback function that is called when a new token is received from the stream.
generation_kwargsOptional[Dict[str, Any]]NoneAdditional keyword arguments for text generation.

Outputs

ParameterTypeDefaultDescription
repliesList[str]A list of strings representing the generated replies.

Init Parameters

These are the parameters you can configure in Pipeline Builder:

ParameterTypeDefaultDescription
modelstrQwen/Qwen3-0.6BThe Hugging Face text generation model name or path.
taskOptional[Literal['text-generation', 'text2text-generation']]text-generationThe task for the Hugging Face pipeline. The default is text-generation, supported by decoder models like GPT and Qwen. The text2text-generation task (encoder-decoder models like T5) is deprecated and not available in transformers v5+. If not specified, the component infers the task from the model name.
deviceOptional[ComponentDevice]NoneThe device for loading the model. If None, automatically selects the default device. If a device or device map is specified in huggingface_pipeline_kwargs, it overrides this parameter.
tokenOptional[Secret]Secret.from_env_var(['HF_API_TOKEN', 'HF_TOKEN'], strict=False)The token to use as HTTP bearer authorization for remote files. If the token is specified in huggingface_pipeline_kwargs, this parameter is ignored.
generation_kwargsOptional[Dict[str, Any]]NoneA dictionary with keyword arguments to customize text generation: max_length, max_new_tokens, temperature, top_k, top_p. See Hugging Face documentation.
huggingface_pipeline_kwargsOptional[Dict[str, Any]]NoneDictionary with keyword arguments to initialize the Hugging Face pipeline. These override model, task, device, and token init parameters. See Hugging Face documentation.
stop_wordsOptional[List[str]]NoneIf the model generates a stop word, the generation stops. If you provide this parameter, don't specify stopping_criteria in generation_kwargs.
streaming_callbackOptional[StreamingCallbackT]NoneAn optional callable for handling streaming responses.

Run Method Parameters

These are the parameters you can configure for the component's run() method. This means you can pass these parameters at query time through the API, in Playground, or when running a job. For details, see Modify Pipeline Parameters at Query Time.

ParameterTypeDefaultDescription
promptstrA string representing the prompt.
streaming_callbackOptional[StreamingCallbackT]NoneA callback function that is called when a new token is received from the stream.
generation_kwargsOptional[Dict[str, Any]]NoneAdditional keyword arguments for text generation.