HuggingFaceAPIChatGenerator

Complete chats using Hugging Face APIs.

Basic Information

Type: haystack_integrations.generators.chat.hugging_face_api.HuggingFaceAPIChatGenerator
Components it can connect with:
- ChatPromptBuilder: HuggingFaceAPIChatGenerator receives a rendered prompt from ChatPromptBuilder.
- DeepsetAnswerBuilder: HuggingFaceAPIChatGenerator sends the generated replies to DeepsetAnswerBuilder through OutputAdapter (see Usage Examples below).

Inputs

Parameter	Type	Default	Description
messages	List[ChatMessage]		A list of ChatMessage objects representing the input messages.
generation_kwargs	Optional[Dict[str, Any]]	None	Additional keyword arguments for text generation.
tools	Optional[Union[List[Tool], Toolset]]	None	A list of tools or a Toolset for which the model can prepare calls. This parameter can accept either a list of `Tool` objects or a `Toolset` instance.
streaming_callback	Optional[StreamingCallbackT]	None	An optional callable for handling streaming responses.

Outputs

Parameter	Type	Default	Description
replies	List[ChatMessage]		A list containing the generated responses as ChatMessage objects.

Overview

Use HuggingFaceAPIChatGenerator to generate text with Hugging Face APIs:

The component supports multimodal inputs, allowing you to send both text and images to Vision Language Models (VLMs) through Hugging Face APIs. The implementation follows the HF VLM API format specification and maintains full backward compatibility with text-only messages. Supported models include Qwen/Qwen2.5-VL-7B-Instruct and other VLM models available through Hugging Face.

Authorization

You need a Hugging Face API token to use this component with serverless inference API or inference endpoints. Connect deepset to your Hugging Face account on the Integrations page.

Add Workspace-Level Integration

Click your profile icon and choose Settings.
Go to Workspace>Integrations.
Find the provider you want to connect and click Connect next to them.
Enter the API key and any other required details.
Click Connect. You can use this integration in pipelines and indexes in the current workspace.

Add Organization-Level Integration

Click your profile icon and choose Settings.
Go to Organization>Integrations.
Find the provider you want to connect and click Connect next to them.
Enter the API key and any other required details.
Click Connect. You can use this integration in pipelines and indexes in all workspaces in the current organization.

Finish Reason Behavior

The finish_reason field behavior has been updated to ensure consistent values regardless of streaming mode. The updated mapping is:

length → length
eos_token → stop
stop_sequence → stop
If tool calls are present → tool_calls

Usage Example

Initializing the Component

components:
  HuggingFaceAPIChatGenerator:
    type: haystack.components.generators.chat.hugging_face_api.HuggingFaceAPIChatGenerator
    init_parameters:

Using the Component in a Pipeline

This is an example RAG pipeline with HuggingFaceAPIChatGenerator and DeepsetAnswerBuilder. HuggingFaceAPIChatGenerator is configured to user serverless inference API:

components:
  bm25_retriever: # Selects the most similar documents from the document store
    type: haystack_integrations.components.retrievers.opensearch.bm25_retriever.OpenSearchBM25Retriever
    init_parameters:
      document_store:
        type: haystack_integrations.document_stores.opensearch.document_store.OpenSearchDocumentStore
        init_parameters:
          hosts:
          index: 'Standard-Index-English'
          max_chunk_bytes: 104857600
          embedding_dim: 768
          return_embedding: false
          method:
          mappings:
          settings:
          create_index: true
          http_auth:
          use_ssl:
          verify_certs:
          timeout:
      top_k: 20 # The number of results to return
      fuzziness: 0

  query_embedder:
    type: deepset_cloud_custom_nodes.embedders.nvidia.text_embedder.DeepsetNvidiaTextEmbedder
    init_parameters:
      normalize_embeddings: true
      model: intfloat/e5-base-v2

  embedding_retriever: # Selects the most similar documents from the document store
    type: haystack_integrations.components.retrievers.opensearch.embedding_retriever.OpenSearchEmbeddingRetriever
    init_parameters:
      document_store:
        type: haystack_integrations.document_stores.opensearch.document_store.OpenSearchDocumentStore
        init_parameters:
          hosts:
          index: 'Standard-Index-English'
          max_chunk_bytes: 104857600
          embedding_dim: 768
          return_embedding: false
          method:
          mappings:
          settings:
          create_index: true
          http_auth:
          use_ssl:
          verify_certs:
          timeout:
      top_k: 20 # The number of results to return

  document_joiner:
    type: haystack.components.joiners.document_joiner.DocumentJoiner
    init_parameters:
      join_mode: concatenate

  ranker:
    type: deepset_cloud_custom_nodes.rankers.nvidia.ranker.DeepsetNvidiaRanker
    init_parameters:
      model: intfloat/simlm-msmarco-reranker
      top_k: 8

  meta_field_grouping_ranker:
    type: haystack.components.rankers.meta_field_grouping_ranker.MetaFieldGroupingRanker
    init_parameters:
      group_by: file_id
      subgroup_by:
      sort_docs_by: split_id

  answer_builder:
    type: deepset_cloud_custom_nodes.augmenters.deepset_answer_builder.DeepsetAnswerBuilder
    init_parameters:
      reference_pattern: acm

  ChatPromptBuilder:
    type: haystack.components.builders.chat_prompt_builder.ChatPromptBuilder
    init_parameters:
      template:
      - _content:
        - text: "You are a helpful assistant answering the user's questions based on the provided documents.\nIf the answer is not in the documents, rely on the web_search tool to find information.\nDo not use your own knowledge.\n"
        _role: system
      - _content:
        - text: "Provided documents:\n{% for document in documents %}\nDocument [{{ loop.index }}] :\n{{ document.content }}\n{% endfor %}\n\nQuestion: {{ query }}\n"
        _role: user
      required_variables:
      variables:
  OutputAdapter:
    type: haystack.components.converters.output_adapter.OutputAdapter
    init_parameters:
      template: '{{ replies[0] }}'
      output_type: List[str]
      custom_filters:
      unsafe: false

  HuggingFaceAPIChatGenerator:
    type: haystack.components.generators.chat.hugging_face_api.HuggingFaceAPIChatGenerator
    init_parameters:
      api_type: serverless_inference_api
      api_params:
        model: HuggingFaceH4/zephyr-7b-beta
      token:
        type: env_var
        env_vars:
        - HF_API_TOKEN
        - HF_TOKEN
        strict: false
      generation_kwargs:
      stop_words:
      streaming_callback:
      tools:

connections:  # Defines how the components are connected
- sender: bm25_retriever.documents
  receiver: document_joiner.documents
- sender: query_embedder.embedding
  receiver: embedding_retriever.query_embedding
- sender: embedding_retriever.documents
  receiver: document_joiner.documents
- sender: document_joiner.documents
  receiver: ranker.documents
- sender: ranker.documents
  receiver: meta_field_grouping_ranker.documents
- sender: meta_field_grouping_ranker.documents
  receiver: answer_builder.documents
- sender: meta_field_grouping_ranker.documents
  receiver: ChatPromptBuilder.documents
- sender: OutputAdapter.output
  receiver: answer_builder.replies
- sender: ChatPromptBuilder.prompt
  receiver: HuggingFaceAPIChatGenerator.messages
- sender: HuggingFaceAPIChatGenerator.replies
  receiver: OutputAdapter.replies

inputs:  # Define the inputs for your pipeline
  query:  # These components will receive the query as input
  - "bm25_retriever.query"
  - "query_embedder.text"
  - "ranker.query"
  - "answer_builder.query"
  - "ChatPromptBuilder.query"
  filters:  # These components will receive a potential query filter as input
  - "bm25_retriever.filters"
  - "embedding_retriever.filters"

outputs:  # Defines the output of your pipeline
  documents: "meta_field_grouping_ranker.documents"  # The output of the pipeline is the retrieved documents
  answers: "answer_builder.answers"  # The output of the pipeline is the generated answers

max_runs_per_component: 100

metadata: {}

Parameters

Init Parameters

These are the parameters you can configure in Pipeline Builder:

Parameter	Type	Default	Description
api_type	Union[HFGenerationAPIType, str]		The type of Hugging Face API to use. Available types: - `text_generation_inference`: For details, see TGI. - `inference_endpoints`: For details, see Inference Endpoints. - `serverless_inference_api`: For more information, see Serverless Inference API - Inference Providers.
api_params	Dict[str, str]		A dictionary with the following keys: - `model`: Hugging Face model ID. Required when `api_type` is `SERVERLESS_INFERENCE_API`. - `provider`: Provider name. Recommended when `api_type` is `SERVERLESS_INFERENCE_API`. - `url`: URL of the inference endpoint. Required when `api_type` is `INFERENCE_ENDPOINTS` or `TEXT_GENERATION_INFERENCE`. - Other parameters specific to the chosen API type, such as `timeout`, `headers`, etc.
token	Optional[Secret]	Secret.from_env_var(['HF_API_TOKEN', 'HF_TOKEN'], strict=False)	The Hugging Face token to use as HTTP bearer authorization. Check your HF token in your account settings.
generation_kwargs	Optional[Dict[str, Any]]	None	A dictionary with keyword arguments to customize text generation. Some examples: `max_tokens`, `temperature`, `top_p`. For details, see Hugging Face chat_completion documentation.
stop_words	Optional[List[str]]	None	An optional list of strings representing the stop words.
streaming_callback	Optional[StreamingCallbackT]	None	An optional callable for handling streaming responses.
tools	Optional[Union[List[Tool], Toolset]]	None	A list of tools or a Toolset for which the model can prepare calls. The chosen model should support tool/function calling, according to the model card. Support for tools in the Hugging Face API and TGI is not yet fully

Run Method Parameters

These are the parameters you can configure for the component's run() method. This means you can pass these parameters at query time through the API, in Playground, or when running a job. For details, see Modify Pipeline Parameters at Query Time.

Parameter	Type	Default	Description
messages	List[ChatMessage]		A list of ChatMessage objects representing the input messages.
generation_kwargs	Optional[Dict[str, Any]]	None	Additional keyword arguments for text generation.
tools	Optional[Union[List[Tool], Toolset]]	None	A list of tools or a Toolset for which the model can prepare calls. This parameter can accept either a list of `Tool` objects or a `Toolset` instance.
streaming_callback	Optional[StreamingCallbackT]	None	An optional callable for handling streaming responses.

Was this page helpful?

Basic Information​

Inputs​

Outputs​

Overview​

Authorization​

Add Workspace-Level Integration​

Add Organization-Level Integration​

Finish Reason Behavior​

Usage Example​

Initializing the Component​

Using the Component in a Pipeline​

Parameters​

Init Parameters​

Run Method Parameters​