Skip to main content
For the complete documentation index for agents and LLMs, see llms.txt.

HuggingFaceAPIGenerator

Generate text using Hugging Face APIs.

Model Limitations

As of July 2025, the Hugging Face Inference API no longer offers generative models through the text_generation endpoint. Generative models are now only available through providers supporting the chat_completion endpoint. This component might no longer work with the Hugging Face Inference API.

Use the HuggingFaceAPIChatGenerator component instead, which supports the chat_completion endpoint and works with the free Serverless Inference API.

Key Features

  • Text generation using Hugging Face paid Inference Endpoints and self-hosted Text Generation Inference (TGI)
  • Streaming support for real-time token-by-token responses
  • Designed for text generation, not chat (use HuggingFaceAPIChatGenerator for chat)
  • Returns generated text along with metadata such as token count and finish reason

Configuration

  1. Drag the HuggingFaceAPIGenerator component onto the canvas from the Component Library.
  2. Click on the component to open the configuration panel.
  3. On the General tab:
    1. Select the API type: inference_endpoints or text_generation_inference.
    2. Enter the endpoint URL in the API parameters.
    3. Enter your Hugging Face API token. For details, see Use Hugging Face Models.
  4. Go to the Advanced tab to configure generation parameters, stop words, and streaming.

Connections

HuggingFaceAPIGenerator accepts a text prompt (str) through its prompt input. It outputs replies (a list of strings) and meta (a list of metadata dictionaries).

Connect PromptBuilder's prompt output to this component's prompt input. Connect the replies output to AnswerBuilder.

Source Code

To check this component's source code, open hugging_face_api.py in the Haystack repository.

Usage Examples

Basic Configuration

  HuggingFaceAPIGenerator:
type: haystack.components.generators.hugging_face_api.HuggingFaceAPIGenerator
init_parameters:
api_type: inference_endpoints
api_params:
url: <your-inference-endpoint-url>
token:
type: env_var
env_vars:
- HF_API_TOKEN
- HF_TOKEN
strict: false
generation_kwargs:
max_new_tokens: 500
temperature: 0.7

This query pipeline uses HuggingFaceAPIGenerator with a paid Inference Endpoint:

components:
bm25_retriever:
type: haystack_integrations.components.retrievers.opensearch.bm25_retriever.OpenSearchBM25Retriever
init_parameters:
document_store:
type: haystack_integrations.document_stores.opensearch.document_store.OpenSearchDocumentStore
init_parameters:
hosts:
index: 'default'
max_chunk_bytes: 104857600
embedding_dim: 768
return_embedding: false
method:
mappings:
settings:
create_index: true
http_auth:
use_ssl:
verify_certs:
timeout:
top_k: 10
fuzziness: 0

PromptBuilder:
type: haystack.components.builders.prompt_builder.PromptBuilder
init_parameters:
template: |
Given the following information, answer the question.

Context:
{% for document in documents %}
{{ document.content }}
{% endfor %}

Question: {{ query }}
required_variables:
variables:

HuggingFaceAPIGenerator:
type: haystack.components.generators.hugging_face_api.HuggingFaceAPIGenerator
init_parameters:
api_type: inference_endpoints
api_params:
url: <your-inference-endpoint-url>
token:
type: env_var
env_vars:
- HF_API_TOKEN
- HF_TOKEN
strict: false
generation_kwargs:
max_new_tokens: 500
temperature: 0.7
stop_words:
streaming_callback:

AnswerBuilder:
type: haystack.components.builders.answer_builder.AnswerBuilder
init_parameters:
pattern:
reference_pattern:

connections:
- sender: bm25_retriever.documents
receiver: PromptBuilder.documents
- sender: PromptBuilder.prompt
receiver: HuggingFaceAPIGenerator.prompt
- sender: HuggingFaceAPIGenerator.replies
receiver: AnswerBuilder.replies
- sender: bm25_retriever.documents
receiver: AnswerBuilder.documents

inputs:
query:
- bm25_retriever.query
- PromptBuilder.query
- AnswerBuilder.query

outputs:
answers: AnswerBuilder.answers

max_runs_per_component: 100

metadata: {}

Parameters

Inputs

ParameterTypeDefaultDescription
promptstrA string representing the prompt.
streaming_callbackOptional[StreamingCallbackT]NoneA callback function that is called when a new token is received from the stream.
generation_kwargsOptional[Dict[str, Any]]NoneAdditional keyword arguments for text generation.

Outputs

ParameterTypeDescription
repliesList[str]A list of strings representing the generated replies.
metaList[Dict[str, Any]]A list of dictionaries with metadata associated with each reply, such as token count and finish reason.

Init Parameters

These are the parameters you can configure in Pipeline Builder:

ParameterTypeDefaultDescription
api_typeUnion[HFGenerationAPIType, str]The type of Hugging Face API to use. Options: text_generation_inference (self-hosted TGI), inference_endpoints (paid endpoints), serverless_inference_api (free API, may not work for generative models).
api_paramsDict[str, str]A dictionary with: model (required for serverless_inference_api), url (required for inference_endpoints or text_generation_inference), and other parameters like timeout, headers, provider.
tokenOptional[Secret]Secret.from_env_var(['HF_API_TOKEN', 'HF_TOKEN'], strict=False)The Hugging Face token to use as HTTP bearer authorization. Check your HF token in your account settings.
generation_kwargsOptional[Dict[str, Any]]NoneA dictionary with keyword arguments to customize text generation: max_new_tokens, temperature, top_k, top_p. See Hugging Face documentation.
stop_wordsOptional[List[str]]NoneAn optional list of strings representing the stop words.
streaming_callbackOptional[StreamingCallbackT]NoneAn optional callable for handling streaming responses.

Run Method Parameters

These are the parameters you can configure for the component's run() method. This means you can pass these parameters at query time through the API, in Playground, or when running a job. For details, see Modify Pipeline Parameters at Query Time.

ParameterTypeDefaultDescription
promptstrA string representing the prompt.
streaming_callbackOptional[StreamingCallbackT]NoneA callback function that is called when a new token is received from the stream.
generation_kwargsOptional[Dict[str, Any]]NoneAdditional keyword arguments for text generation.