HuggingFaceLocalGenerator
Generate text using models from Hugging Face that run locally.
The text2text-generation task is deprecated and may be removed in a future release. In transformers v5+, text2text-generation is no longer available as a valid pipeline task. If you currently use task="text2text-generation", replace it with task="text-generation" and ensure the selected model is compatible. To use the older behavior, pin transformers<5.
Key Features
- Runs Hugging Face LLMs locally, without external API calls.
- Supports decoder models like GPT and Qwen with the
text-generationtask. - Accepts configurable generation parameters such as
max_new_tokensandtemperature. - Supports streaming responses via a callback function.
- Uses a Hugging Face API token for downloading private or gated models.
Configuration
- Drag the
HuggingFaceLocalGeneratorcomponent onto the canvas from the Component Library. - Click on the component to open the configuration panel.
- On the General tab:
- Set the model name or path. Connect the platform to Hugging Face first. For instructions, see Use Hugging Face Models.
- Set the
taskparameter. The default istext-generation, which works with decoder models like GPT and Qwen.
- Go to the Advanced tab to configure
generation_kwargs(such asmax_new_tokens,temperature),huggingface_pipeline_kwargs,stop_words, anddevice.
Connections
HuggingFaceLocalGenerator accepts a prompt string as input. Connect its prompt input to the prompt output of PromptBuilder.
It outputs replies as a list of strings. Connect its replies output to AnswerBuilder.
Source Code
To check this component's source code, open hugging_face_local.py in the Haystack repository.
Usage Examples
Basic Configuration
HuggingFaceLocalGenerator:
type: haystack.components.generators.hugging_face_local.HuggingFaceLocalGenerator
init_parameters:
model: Qwen/Qwen3-0.6B
task: text-generation
token:
type: env_var
env_vars:
- HF_API_TOKEN
- HF_TOKEN
strict: false
generation_kwargs:
max_new_tokens: 100
temperature: 0.9
This query pipeline uses HuggingFaceLocalGenerator for local text generation:
components:
bm25_retriever:
type: haystack_integrations.components.retrievers.opensearch.bm25_retriever.OpenSearchBM25Retriever
init_parameters:
document_store:
type: haystack_integrations.document_stores.opensearch.document_store.OpenSearchDocumentStore
init_parameters:
hosts:
index: 'default'
max_chunk_bytes: 104857600
embedding_dim: 768
return_embedding: false
method:
mappings:
settings:
create_index: true
http_auth:
use_ssl:
verify_certs:
timeout:
top_k: 10
fuzziness: 0
PromptBuilder:
type: haystack.components.builders.prompt_builder.PromptBuilder
init_parameters:
template: |
Given the following information, answer the question.
Context:
{% for document in documents %}
{{ document.content }}
{% endfor %}
Question: {{ query }}
required_variables:
variables:
HuggingFaceLocalGenerator:
type: haystack.components.generators.hugging_face_local.HuggingFaceLocalGenerator
init_parameters:
model: Qwen/Qwen3-0.6B
task: text-generation
device:
token:
type: env_var
env_vars:
- HF_API_TOKEN
- HF_TOKEN
strict: false
generation_kwargs:
max_new_tokens: 100
temperature: 0.9
huggingface_pipeline_kwargs:
stop_words:
streaming_callback:
AnswerBuilder:
type: haystack.components.builders.answer_builder.AnswerBuilder
init_parameters:
pattern:
reference_pattern:
connections:
- sender: bm25_retriever.documents
receiver: PromptBuilder.documents
- sender: PromptBuilder.prompt
receiver: HuggingFaceLocalGenerator.prompt
- sender: HuggingFaceLocalGenerator.replies
receiver: AnswerBuilder.replies
- sender: bm25_retriever.documents
receiver: AnswerBuilder.documents
inputs:
query:
- bm25_retriever.query
- PromptBuilder.query
- AnswerBuilder.query
outputs:
answers: AnswerBuilder.answers
max_runs_per_component: 100
metadata: {}
Parameters
Inputs
| Parameter | Type | Default | Description |
|---|---|---|---|
prompt | str | A string representing the prompt. | |
streaming_callback | Optional[StreamingCallbackT] | None | A callback function that is called when a new token is received from the stream. |
generation_kwargs | Optional[Dict[str, Any]] | None | Additional keyword arguments for text generation. |
Outputs
| Parameter | Type | Default | Description |
|---|---|---|---|
replies | List[str] | A list of strings representing the generated replies. |
Init Parameters
These are the parameters you can configure in Pipeline Builder:
| Parameter | Type | Default | Description |
|---|---|---|---|
model | str | Qwen/Qwen3-0.6B | The Hugging Face text generation model name or path. |
task | Optional[Literal['text-generation', 'text2text-generation']] | text-generation | The task for the Hugging Face pipeline. The default is text-generation, supported by decoder models like GPT and Qwen. The text2text-generation task (encoder-decoder models like T5) is deprecated and not available in transformers v5+. If not specified, the component infers the task from the model name. |
device | Optional[ComponentDevice] | None | The device for loading the model. If None, automatically selects the default device. If a device or device map is specified in huggingface_pipeline_kwargs, it overrides this parameter. |
token | Optional[Secret] | Secret.from_env_var(['HF_API_TOKEN', 'HF_TOKEN'], strict=False) | The token to use as HTTP bearer authorization for remote files. If the token is specified in huggingface_pipeline_kwargs, this parameter is ignored. |
generation_kwargs | Optional[Dict[str, Any]] | None | A dictionary with keyword arguments to customize text generation: max_length, max_new_tokens, temperature, top_k, top_p. See Hugging Face documentation. |
huggingface_pipeline_kwargs | Optional[Dict[str, Any]] | None | Dictionary with keyword arguments to initialize the Hugging Face pipeline. These override model, task, device, and token init parameters. See Hugging Face documentation. |
stop_words | Optional[List[str]] | None | If the model generates a stop word, the generation stops. If you provide this parameter, don't specify stopping_criteria in generation_kwargs. |
streaming_callback | Optional[StreamingCallbackT] | None | An optional callable for handling streaming responses. |
Run Method Parameters
These are the parameters you can configure for the component's run() method. This means you can pass these parameters at query time through the API, in Playground, or when running a job. For details, see Modify Pipeline Parameters at Query Time.
| Parameter | Type | Default | Description |
|---|---|---|---|
prompt | str | A string representing the prompt. | |
streaming_callback | Optional[StreamingCallbackT] | None | A callback function that is called when a new token is received from the stream. |
generation_kwargs | Optional[Dict[str, Any]] | None | Additional keyword arguments for text generation. |
Related Information
Was this page helpful?