HuggingFaceLocalChatGenerator

Generate chat responses using models from Hugging Face that run locally.

Basic Information

Type: haystack_integrations.generators.chat.hugging_face_local.HuggingFaceLocalChatGenerator
Components it can connect with:
- ChatPromptBuilder: HuggingFaceLocalChatGenerator receives a rendered prompt from ChatPromptBuilder.
- DeepsetAnswerBuilder: HuggingFaceLocalChatGenerator sends the generated replies to DeepsetAnswerBuilder through OutputAdapter (see Usage Examples below).

Inputs

Parameter	Type	Default	Description
messages	List[ChatMessage]		A list of ChatMessage objects representing the input messages.
generation_kwargs	Optional[Dict[str, Any]]	None	Additional keyword arguments for text generation.
streaming_callback	Optional[StreamingCallbackT]	None	An optional callable for handling streaming responses.
tools	Optional[Union[List[Tool], Toolset]]	None	A list of tools or a Toolset for which the model can prepare calls. If set, it will override the `tools` parameter provided during initialization. This parameter can accept either a list of `Tool` objects or a `Toolset` instance.

Outputs

Parameter	Type	Default	Description
replies	List[ChatMessage]		A list containing the generated responses as ChatMessage instances.

Overview

Use HuggingFaceLocalChatGenerator with chat-based models, such as HuggingFaceH4/zephyr-7b-beta or meta-llama/Llama-2-7b-chat-hf. LLMs running locally may need powerful hardware. It depends on the model and its parameters count.

Authorization

For remote file authorization, you need a Hugging Face API token. Connect deepset to your Hugging Face account on the Integrations page.

Add Workspace-Level Integration

Click your profile icon and choose Settings.
Go to Workspace>Integrations.
Find the provider you want to connect and click Connect next to them.
Enter the API key and any other required details.
Click Connect. You can use this integration in pipelines and indexes in the current workspace.

Add Organization-Level Integration

Click your profile icon and choose Settings.
Go to Organization>Integrations.
Find the provider you want to connect and click Connect next to them.
Enter the API key and any other required details.
Click Connect. You can use this integration in pipelines and indexes in all workspaces in the current organization.

Usage Example

Initializing the Component

components:
  HuggingFaceLocalChatGenerator:
    type: components.generators.chat.hugging_face_local.HuggingFaceLocalChatGenerator
    init_parameters:

Using the Component in a Pipeline

This is an example RAG pipeline with HuggingFaceLocalChatGenerator and DeepsetAnswerBuilder connected through OutputAdapter:

components:
  bm25_retriever: # Selects the most similar documents from the document store
    type: haystack_integrations.components.retrievers.opensearch.bm25_retriever.OpenSearchBM25Retriever
    init_parameters:
      document_store:
        type: haystack_integrations.document_stores.opensearch.document_store.OpenSearchDocumentStore
        init_parameters:
          hosts:
          index: 'Standard-Index-English'
          max_chunk_bytes: 104857600
          embedding_dim: 768
          return_embedding: false
          method:
          mappings:
          settings:
          create_index: true
          http_auth:
          use_ssl:
          verify_certs:
          timeout:
      top_k: 20 # The number of results to return
      fuzziness: 0

  query_embedder:
    type: deepset_cloud_custom_nodes.embedders.nvidia.text_embedder.DeepsetNvidiaTextEmbedder
    init_parameters:
      normalize_embeddings: true
      model: intfloat/e5-base-v2

  embedding_retriever: # Selects the most similar documents from the document store
    type: haystack_integrations.components.retrievers.opensearch.embedding_retriever.OpenSearchEmbeddingRetriever
    init_parameters:
      document_store:
        type: haystack_integrations.document_stores.opensearch.document_store.OpenSearchDocumentStore
        init_parameters:
          hosts:
          index: 'Standard-Index-English'
          max_chunk_bytes: 104857600
          embedding_dim: 768
          return_embedding: false
          method:
          mappings:
          settings:
          create_index: true
          http_auth:
          use_ssl:
          verify_certs:
          timeout:
      top_k: 20 # The number of results to return

  document_joiner:
    type: haystack.components.joiners.document_joiner.DocumentJoiner
    init_parameters:
      join_mode: concatenate

  ranker:
    type: deepset_cloud_custom_nodes.rankers.nvidia.ranker.DeepsetNvidiaRanker
    init_parameters:
      model: intfloat/simlm-msmarco-reranker
      top_k: 8

  meta_field_grouping_ranker:
    type: haystack.components.rankers.meta_field_grouping_ranker.MetaFieldGroupingRanker
    init_parameters:
      group_by: file_id
      subgroup_by:
      sort_docs_by: split_id

  answer_builder:
    type: deepset_cloud_custom_nodes.augmenters.deepset_answer_builder.DeepsetAnswerBuilder
    init_parameters:
      reference_pattern: acm

  ChatPromptBuilder:
    type: haystack.components.builders.chat_prompt_builder.ChatPromptBuilder
    init_parameters:
      template:
      - _content:
        - text: "You are a helpful assistant answering the user's questions based on the provided documents.\nIf the answer is not in the documents, rely on the web_search tool to find information.\nDo not use your own knowledge.\n"
        _role: system
      - _content:
        - text: "Provided documents:\n{% for document in documents %}\nDocument [{{ loop.index }}] :\n{{ document.content }}\n{% endfor %}\n\nQuestion: {{ query }}\n"
        _role: user
      required_variables:
      variables:
  OutputAdapter:
    type: haystack.components.converters.output_adapter.OutputAdapter
    init_parameters:
      template: '{{ replies[0] }}'
      output_type: List[str]
      custom_filters:
      unsafe: false

  HuggingFaceLocalChatGenerator:
    type: haystack.components.generators.chat.hugging_face_local.HuggingFaceLocalChatGenerator
    init_parameters:
      model: HuggingFaceH4/zephyr-7b-beta
      task:
      device:
      token:
        type: env_var
        env_vars:
        - HF_API_TOKEN
        - HF_TOKEN
        strict: false
      chat_template:
      generation_kwargs:
      huggingface_pipeline_kwargs:
      stop_words:
      streaming_callback:
      tools:
      tool_parsing_function:
      async_executor:

connections:  # Defines how the components are connected
- sender: bm25_retriever.documents
  receiver: document_joiner.documents
- sender: query_embedder.embedding
  receiver: embedding_retriever.query_embedding
- sender: embedding_retriever.documents
  receiver: document_joiner.documents
- sender: document_joiner.documents
  receiver: ranker.documents
- sender: ranker.documents
  receiver: meta_field_grouping_ranker.documents
- sender: meta_field_grouping_ranker.documents
  receiver: answer_builder.documents
- sender: meta_field_grouping_ranker.documents
  receiver: ChatPromptBuilder.documents
- sender: OutputAdapter.output
  receiver: answer_builder.replies
- sender: ChatPromptBuilder.prompt
  receiver: HuggingFaceLocalChatGenerator.messages
- sender: HuggingFaceLocalChatGenerator.replies
  receiver: OutputAdapter.replies

inputs:  # Define the inputs for your pipeline
  query:  # These components will receive the query as input
  - "bm25_retriever.query"
  - "query_embedder.text"
  - "ranker.query"
  - "answer_builder.query"
  - "ChatPromptBuilder.query"
  filters:  # These components will receive a potential query filter as input
  - "bm25_retriever.filters"
  - "embedding_retriever.filters"

outputs:  # Defines the output of your pipeline
  documents: "meta_field_grouping_ranker.documents"  # The output of the pipeline is the retrieved documents
  answers: "answer_builder.answers"  # The output of the pipeline is the generated answers

max_runs_per_component: 100

metadata: {}

Parameters

Init Parameters

These are the parameters you can configure in Pipeline Builder:

Parameter	Type	Default	Description
model	str	HuggingFaceH4/zephyr-7b-beta	The Hugging Face text generation model name or path, for example, `mistralai/Mistral-7B-Instruct-v0.2` or `TheBloke/OpenHermes-2.5-Mistral-7B-16k-AWQ`. The model must be a chat model supporting the ChatML messaging format. If the model is specified in `huggingface_pipeline_kwargs`, this parameter is ignored.
task	Optional[Literal['text-generation', 'text2text-generation']]	None	The task for the Hugging Face pipeline. Possible options: - `text-generation`: Supported by decoder models, like GPT. - `text2text-generation`: Supported by encoder-decoder models, like T5. If the task is specified in `huggingface_pipeline_kwargs`, this parameter is ignored. If not specified, the component calls the Hugging Face API to infer the task from the model name.
device	Optional[ComponentDevice]	None	The device for loading the model. If `None`, automatically selects the default device. If a device or device map is specified in `huggingface_pipeline_kwargs`, it overrides this parameter.
token	Optional[Secret]	Secret.from_env_var(['HF_API_TOKEN', 'HF_TOKEN'], strict=False)	The token to use as HTTP bearer authorization for remote files. If the token is specified in `huggingface_pipeline_kwargs`, this parameter is ignored.
chat_template	Optional[str]	None	Specifies an optional Jinja template for formatting chat messages. Most high-quality chat models have their own templates, but for models without this feature or if you prefer a custom template, use this parameter.
generation_kwargs	Optional[Dict[str, Any]]	None	A dictionary with keyword arguments to customize text generation. Some examples: `max_length`, `max_new_tokens`, `temperature`, `top_k`, `top_p`. See Hugging Face's documentation for more information: - customize-text-generation - GenerationConfig The only `generation_kwargs` set by default is `max_new_tokens`, which is set to 512 tokens.
huggingface_pipeline_kwargs	Optional[Dict[str, Any]]	None	Dictionary with keyword arguments to initialize the Hugging Face pipeline for text generation. These keyword arguments provide fine-grained control over the Hugging Face pipeline. In case of duplication, these kwargs override `model`, `task`, `device`, and `token` init parameters. For kwargs, see Hugging Face documentation. In this dictionary, you can also include `model_kwargs` to specify the kwargs for model initialization
stop_words	Optional[List[str]]	None	A list of stop words. If the model generates a stop word, the generation stops. If you provide this parameter, don't specify the `stopping_criteria` in `generation_kwargs`. For some chat models, the output includes both the new text and the original prompt. In these cases, make sure your prompt has no stop words.
streaming_callback	Optional[StreamingCallbackT]	None	An optional callable for handling streaming responses.
tools	Optional[Union[List[Tool], Toolset]]	None	A list of tools or a Toolset for which the model can prepare calls. This parameter can accept either a list of `Tool` objects or a `Toolset` instance.
tool_parsing_function	Optional[Callable[[str], Optional[List[ToolCall]]]]	None	A callable that takes a string and returns a list of ToolCall objects or None. If None, the default_tool_parser will be used which extracts tool calls using a predefined pattern.
async_executor	Optional[ThreadPoolExecutor]	None	Optional ThreadPoolExecutor to use for async calls. If not provided, a single-threaded executor will be initialized and used

Run Method Parameters

These are the parameters you can configure for the component's run() method. This means you can pass these parameters at query time through the API, in Playground, or when running a job. For details, see Modify Pipeline Parameters at Query Time.

Parameter	Type	Default	Description
messages	List[ChatMessage]		A list of ChatMessage objects representing the input messages.
generation_kwargs	Optional[Dict[str, Any]]	None	Additional keyword arguments for text generation.
streaming_callback	Optional[StreamingCallbackT]	None	An optional callable for handling streaming responses.
tools	Optional[Union[List[Tool], Toolset]]	None	A list of tools or a Toolset for which the model can prepare calls. This parameter can accept either a list of `Tool` objects or a `Toolset` instance.

Was this page helpful?

Basic Information​

Inputs​

Outputs​

Overview​

Authorization​

Add Workspace-Level Integration​

Add Organization-Level Integration​

Usage Example​

Initializing the Component​

Using the Component in a Pipeline​

Parameters​

Init Parameters​

Run Method Parameters​