HuggingFaceAPIGenerator
Generate text using Hugging Face APIs.
Basic Information
- Type:
haystack.components.generators.hugging_face_api.HuggingFaceAPIGenerator - Components it can connect with:
PromptBuilder: Receives a prompt fromPromptBuilder.AnswerBuilder: Sends generated replies toAnswerBuilder.
Inputs
| Parameter | Type | Default | Description |
|---|---|---|---|
| prompt | str | A string representing the prompt. | |
| streaming_callback | Optional[StreamingCallbackT] | None | A callback function that is called when a new token is received from the stream. |
| generation_kwargs | Optional[Dict[str, Any]] | None | Additional keyword arguments for text generation. |
Outputs
| Parameter | Type | Default | Description |
|---|---|---|---|
| replies | List[str] | A list of strings representing the generated replies. | |
| meta | List[Dict[str, Any]] | A list of dictionaries with metadata associated with each reply, such as token count and finish reason. |
Overview
As of July 2025, the Hugging Face Inference API no longer offers generative models through the text_generation endpoint. Generative models are now only available through providers supporting the chat_completion endpoint. This component might no longer work with the Hugging Face Inference API.
Use the HuggingFaceAPIChatGenerator component instead, which supports the chat_completion endpoint and works with the free Serverless Inference API.
HuggingFaceAPIGenerator generates text using various Hugging Face APIs:
- Paid Inference Endpoints: A private instance of the model deployed by Hugging Face, typically paid per hour.
- Self-hosted Text Generation Inference: A toolkit for efficiently deploying and serving LLMs on-premise through Docker.
This component is designed for text generation, not for chat. If you want to use these LLMs for chat, use HuggingFaceAPIChatGenerator instead.
Authentication
When using Inference Endpoints, you must connect Haystack Platform with Hugging Face first. For detailed instructions, see Use Hugging Face Models.
Usage Example
This query pipeline uses HuggingFaceAPIGenerator with a paid Inference Endpoint:
components:
bm25_retriever:
type: haystack_integrations.components.retrievers.opensearch.bm25_retriever.OpenSearchBM25Retriever
init_parameters:
document_store:
type: haystack_integrations.document_stores.opensearch.document_store.OpenSearchDocumentStore
init_parameters:
hosts:
index: 'default'
max_chunk_bytes: 104857600
embedding_dim: 768
return_embedding: false
method:
mappings:
settings:
create_index: true
http_auth:
use_ssl:
verify_certs:
timeout:
top_k: 10
fuzziness: 0
PromptBuilder:
type: haystack.components.builders.prompt_builder.PromptBuilder
init_parameters:
template: |
Given the following information, answer the question.
Context:
{% for document in documents %}
{{ document.content }}
{% endfor %}
Question: {{ query }}
required_variables:
variables:
HuggingFaceAPIGenerator:
type: haystack.components.generators.hugging_face_api.HuggingFaceAPIGenerator
init_parameters:
api_type: inference_endpoints
api_params:
url: <your-inference-endpoint-url>
token:
type: env_var
env_vars:
- HF_API_TOKEN
- HF_TOKEN
strict: false
generation_kwargs:
max_new_tokens: 500
temperature: 0.7
stop_words:
streaming_callback:
AnswerBuilder:
type: haystack.components.builders.answer_builder.AnswerBuilder
init_parameters:
pattern:
reference_pattern:
connections:
- sender: bm25_retriever.documents
receiver: PromptBuilder.documents
- sender: PromptBuilder.prompt
receiver: HuggingFaceAPIGenerator.prompt
- sender: HuggingFaceAPIGenerator.replies
receiver: AnswerBuilder.replies
- sender: bm25_retriever.documents
receiver: AnswerBuilder.documents
inputs:
query:
- bm25_retriever.query
- PromptBuilder.query
- AnswerBuilder.query
outputs:
answers: AnswerBuilder.answers
max_runs_per_component: 100
metadata: {}
Parameters
Init Parameters
These are the parameters you can configure in Pipeline Builder:
| Parameter | Type | Default | Description |
|---|---|---|---|
| api_type | Union[HFGenerationAPIType, str] | The type of Hugging Face API to use. Options: text_generation_inference (self-hosted TGI), inference_endpoints (paid endpoints), serverless_inference_api (free API, may not work for generative models). | |
| api_params | Dict[str, str] | A dictionary with: model (required for serverless_inference_api), url (required for inference_endpoints or text_generation_inference), and other parameters like timeout, headers, provider. | |
| token | Optional[Secret] | Secret.from_env_var(['HF_API_TOKEN', 'HF_TOKEN'], strict=False) | The Hugging Face token to use as HTTP bearer authorization. Check your HF token in your account settings. |
| generation_kwargs | Optional[Dict[str, Any]] | None | A dictionary with keyword arguments to customize text generation: max_new_tokens, temperature, top_k, top_p. See Hugging Face documentation. |
| stop_words | Optional[List[str]] | None | An optional list of strings representing the stop words. |
| streaming_callback | Optional[StreamingCallbackT] | None | An optional callable for handling streaming responses. |
Run Method Parameters
These are the parameters you can configure for the run() method. You can pass these parameters at query time through the API, in Playground, or when running a job.
| Parameter | Type | Default | Description |
|---|---|---|---|
| prompt | str | A string representing the prompt. | |
| streaming_callback | Optional[StreamingCallbackT] | None | A callback function that is called when a new token is received from the stream. |
| generation_kwargs | Optional[Dict[str, Any]] | None | Additional keyword arguments for text generation. |
Was this page helpful?