HuggingFaceAPIGenerator

Generates text using Hugging Face APIs.

Basic Information

Type: haystack_integrations.generators.hugging_face_api.HuggingFaceAPIGenerator

Inputs

Parameter	Type	Default	Description
prompt	str		A string representing the prompt.
streaming_callback	Optional[StreamingCallbackT]	None	A callback function that is called when a new token is received from the stream.
generation_kwargs	Optional[Dict[str, Any]]	None	Additional keyword arguments for text generation.

Outputs

Parameter	Type	Default	Description
replies	List[str]		A dictionary with the generated replies and metadata. Both are lists of length n. - replies: A list of strings representing the generated replies.
meta	List[Dict[str, Any]]		A dictionary with the generated replies and metadata. Both are lists of length n. - replies: A list of strings representing the generated replies.

Overview

Work in Progress

Bear with us while we're working on adding pipeline examples and most common components connections.

Generates text using Hugging Face APIs.

Use it with the following Hugging Face APIs:

Note: As of July 2025, the Hugging Face Inference API no longer offers generative models through the text_generation endpoint. Generative models are now only available through providers supporting the chat_completion endpoint. As a result, this component might no longer work with the Hugging Face Inference API. Use the HuggingFaceAPIChatGenerator component, which supports the chat_completion endpoint.#### With self-hosted text generation inference

from haystack.components.generators import HuggingFaceAPIGenerator

generator = HuggingFaceAPIGenerator(api_type="text_generation_inference",
                                    api_params={"url": "http://localhost:8080"})

result = generator.run(prompt="What's Natural Language Processing?")
print(result)

With the free serverless inference API

Be aware that this example might not work as the Hugging Face Inference API no longer offer models that support the text_generation endpoint. Use the HuggingFaceAPIChatGenerator for generative models through the chat_completion endpoint.

from haystack.components.generators import HuggingFaceAPIGenerator
from haystack.utils import Secret

generator = HuggingFaceAPIGenerator(api_type="serverless_inference_api",
                                    api_params={"model": "HuggingFaceH4/zephyr-7b-beta"},
                                    token=Secret.from_token("<your-api-key>"))

result = generator.run(prompt="What's Natural Language Processing?")
print(result)

Usage Example

components:
  HuggingFaceAPIGenerator:
    type: haystack.components.generators.hugging_face_api.HuggingFaceAPIGenerator
    init_parameters:

Parameters

Init Parameters

These are the parameters you can configure in Pipeline Builder:

Parameter	Type	Default	Description
api_type	Union[HFGenerationAPIType, str]		The type of Hugging Face API to use. Available types: - `text_generation_inference`: See TGI. - `inference_endpoints`: See Inference Endpoints. - `serverless_inference_api`: See Serverless Inference API. This might no longer work due to changes in the models offered in the Hugging Face Inference API. Please use the `HuggingFaceAPIChatGenerator` component instead.
api_params	Dict[str, str]		A dictionary with the following keys: - `model`: Hugging Face model ID. Required when `api_type` is `SERVERLESS_INFERENCE_API`. - `url`: URL of the inference endpoint. Required when `api_type` is `INFERENCE_ENDPOINTS` or `TEXT_GENERATION_INFERENCE`. - Other parameters specific to the chosen API type, such as `timeout`, `headers`, `provider` etc.
token	Optional[Secret]	Secret.from_env_var(['HF_API_TOKEN', 'HF_TOKEN'], strict=False)	The Hugging Face token to use as HTTP bearer authorization. Check your HF token in your account settings.
generation_kwargs	Optional[Dict[str, Any]]	None	A dictionary with keyword arguments to customize text generation. Some examples: `max_new_tokens`, `temperature`, `top_k`, `top_p`. For details, see Hugging Face documentation for more information.
stop_words	Optional[List[str]]	None	An optional list of strings representing the stop words.
streaming_callback	Optional[StreamingCallbackT]	None	An optional callable for handling streaming responses.

Run Method Parameters

These are the parameters you can configure for the component's run() method. This means you can pass these parameters at query time through the API, in Playground, or when running a job. For details, see Modify Pipeline Parameters at Query Time.

Parameter	Type	Default	Description
prompt	str		A string representing the prompt.
streaming_callback	Optional[StreamingCallbackT]	None	A callback function that is called when a new token is received from the stream.
generation_kwargs	Optional[Dict[str, Any]]	None	Additional keyword arguments for text generation.

Was this page helpful?

Basic Information​

Inputs​

Outputs​

Overview​

With the free serverless inference API​

Usage Example​

Parameters​

Init Parameters​

Run Method Parameters​