AzureOpenAIResponsesChatGenerator

Generate text using OpenAI's Responses API on Azure with support for reasoning models.

Basic Information

Type: haystack.components.generators.chat.azure_responses.AzureOpenAIResponsesChatGenerator
Components it can connect with:
- ChatPromptBuilder: AzureOpenAIResponsesChatGenerator receives a rendered prompt from ChatPromptBuilder.
- DeepsetAnswerBuilder: AzureOpenAIResponsesChatGenerator sends the generated replies to DeepsetAnswerBuilder through OutputAdapter (see Usage Examples below).

Inputs

Parameter	Type	Default	Description
messages	List[ChatMessage]		A list of ChatMessage instances representing the input messages.
streaming_callback	Optional[StreamingCallbackT]	None	A callback function called when a new token is received from the stream.
generation_kwargs	Optional[Dict[str, Any]]	None	Additional keyword arguments for text generation. These parameters override the parameters in pipeline configuration. For supported parameters, see OpenAI documentation.
tools	Optional[Union[List[Tool], Toolset, List[dict]]]	None	A list of Tool objects, a Toolset, or OpenAI/MCP tool definitions that the model can use. Each tool should have a unique name. If set, it will override the `tools` parameter set during component initialization. Note: You cannot pass OpenAI/MCP tools and Haystack tools together.
tools_strict	Optional[Bool]	None	Whether to enable strict schema adherence for tool calls. If set to `True`, the model follows exactly the schema provided in the parameters field of the tool definition, but this may increase latency. If set, it overrides the `tools_strict` parameter set during component initialization.

Outputs

Parameter	Type	Default	Description
replies	List[ChatMessage]		The responses from the model.

Overview

AzureOpenAIResponsesChatGenerator uses OpenAI's Responses API through Azure OpenAI services. It supports gpt-5 and o-series models (reasoning models like o1, o3-mini) deployed on Azure. The default model is gpt-5-mini.

The Responses API is designed for models that can reason. It supports features like reasoning summaries, multi-turn conversations with previous response IDs, and structured outputs. Use AzureOpenAIResponsesChatGenerator to get access to these capabilities through Azure's infrastructure.

You can customize how the text is generated by passing parameters to the OpenAI API. Use the generation_kwargs argument when you configure the component in a pipeline. Any parameter that works with openai.Responses.create will work here too.

For details on OpenAI API parameters, see OpenAI documentation.

Authentication

To use this component, connect Haystack Platform with Azure OpenAI first. For details, see Use Azure OpenAI Models.

Reasoning Support

One of the key features of the Responses API is support for reasoning models. You can configure reasoning behavior using the reasoning parameter in generation_kwargs.

The reasoning parameter accepts:

effort: Specifies the level of reasoning effort for the model. Possible values are: "low", "medium", or "high".
summary: Specifies how to generate reasoning summaries. You can choose: "auto" or "generate_summary": True/False.

info

OpenAI does not return the actual reasoning tokens, but you can view the summary if enabled. For more details, see the OpenAI Reasoning documentation.

Multi-turn Conversations

The Responses API supports multi-turn conversations using previous_response_id. You can pass the response ID from a previous turn to maintain conversation context.

Structured Output

AzureOpenAIResponsesChatGenerator supports structured output generation through the text_format and text parameters in generation_kwargs:

text_format: Pass a Pydantic model to define the structure.
text: Pass a JSON schema directly.

Model Compatibility and Limitations

Both Pydantic models and JSON schemas are supported for latest models starting from GPT-4o.
If both text_format and text are provided, text_format takes precedence and the JSON schema passed to text is ignored.
Streaming is not supported when using structured outputs.
Older models only support basic JSON mode through {"type": "json_object"}. For details, see OpenAI JSON mode documentation.
For complete information, check the Azure OpenAI Structured Outputs documentation.

Usage Example

This is a RAG pipeline where AzureOpenAIResponsesChatGenerator sends the generated replies to DeepsetAnswerBuilder through OutputAdapter:

components:
  bm25_retriever: # Selects the most similar documents from the document store
    type: haystack_integrations.components.retrievers.opensearch.bm25_retriever.OpenSearchBM25Retriever
    init_parameters:
      document_store:
        type: haystack_integrations.document_stores.opensearch.document_store.OpenSearchDocumentStore
        init_parameters:
          hosts:
          index: 'Standard-Index-English'
          max_chunk_bytes: 104857600
          embedding_dim: 768
          return_embedding: false
          method:
          mappings:
          settings:
          create_index: true
          http_auth:
          use_ssl:
          verify_certs:
          timeout:
      top_k: 20 # The number of results to return
      fuzziness: 0

  query_embedder:
    type: deepset_cloud_custom_nodes.embedders.nvidia.text_embedder.DeepsetNvidiaTextEmbedder
    init_parameters:
      normalize_embeddings: true
      model: intfloat/e5-base-v2

  embedding_retriever: # Selects the most similar documents from the document store
    type: haystack_integrations.components.retrievers.opensearch.embedding_retriever.OpenSearchEmbeddingRetriever
    init_parameters:
      document_store:
        type: haystack_integrations.document_stores.opensearch.document_store.OpenSearchDocumentStore
        init_parameters:
          hosts:
          index: 'Standard-Index-English'
          max_chunk_bytes: 104857600
          embedding_dim: 768
          return_embedding: false
          method:
          mappings:
          settings:
          create_index: true
          http_auth:
          use_ssl:
          verify_certs:
          timeout:
      top_k: 20 # The number of results to return

  document_joiner:
    type: haystack.components.joiners.document_joiner.DocumentJoiner
    init_parameters:
      join_mode: concatenate

  ranker:
    type: deepset_cloud_custom_nodes.rankers.nvidia.ranker.DeepsetNvidiaRanker
    init_parameters:
      model: intfloat/simlm-msmarco-reranker
      top_k: 8

  meta_field_grouping_ranker:
    type: haystack.components.rankers.meta_field_grouping_ranker.MetaFieldGroupingRanker
    init_parameters:
      group_by: file_id
      subgroup_by:
      sort_docs_by: split_id

  answer_builder:
    type: deepset_cloud_custom_nodes.augmenters.deepset_answer_builder.DeepsetAnswerBuilder
    init_parameters:
      reference_pattern: acm

  ChatPromptBuilder:
    type: haystack.components.builders.chat_prompt_builder.ChatPromptBuilder
    init_parameters:
      template:
      - _content:
        - text: "You are a helpful assistant answering the user's questions based on the provided documents.\nIf the answer is not in the documents, rely on the web_search tool to find information.\nDo not use your own knowledge.\n"
        _role: system
      - _content:
        - text: "Provided documents:\n{% for document in documents %}\nDocument [{{ loop.index }}] :\n{{ document.content }}\n{% endfor %}\n\nQuestion: {{ query }}\n"
        _role: user
      required_variables:
      variables:
  OutputAdapter:
    type: haystack.components.converters.output_adapter.OutputAdapter
    init_parameters:
      template: '{{ replies[0] }}'
      output_type: List[str]
      custom_filters:
      unsafe: false

  AzureOpenAIResponsesChatGenerator:
    type: haystack.components.generators.chat.azure_responses.AzureOpenAIResponsesChatGenerator
    init_parameters:
      azure_endpoint:
      azure_deployment: gpt-5-mini
      api_key:
        type: env_var
        env_vars:
        - AZURE_OPENAI_API_KEY
        strict: false
      organization:
      streaming_callback:
      timeout:
      max_retries:
      generation_kwargs:
        reasoning:
          effort: low
          summary: auto
      tools:
      tools_strict: false
      http_client_kwargs:

connections:  # Defines how the components are connected
- sender: bm25_retriever.documents
  receiver: document_joiner.documents
- sender: query_embedder.embedding
  receiver: embedding_retriever.query_embedding
- sender: embedding_retriever.documents
  receiver: document_joiner.documents
- sender: document_joiner.documents
  receiver: ranker.documents
- sender: ranker.documents
  receiver: meta_field_grouping_ranker.documents
- sender: meta_field_grouping_ranker.documents
  receiver: answer_builder.documents
- sender: meta_field_grouping_ranker.documents
  receiver: ChatPromptBuilder.documents
- sender: OutputAdapter.output
  receiver: answer_builder.replies
- sender: ChatPromptBuilder.prompt
  receiver: AzureOpenAIResponsesChatGenerator.messages
- sender: AzureOpenAIResponsesChatGenerator.replies
  receiver: OutputAdapter.replies

inputs:  # Define the inputs for your pipeline
  query:  # These components will receive the query as input
  - "bm25_retriever.query"
  - "query_embedder.text"
  - "ranker.query"
  - "answer_builder.query"
  - "ChatPromptBuilder.query"
  filters:  # These components will receive a potential query filter as input
  - "bm25_retriever.filters"
  - "embedding_retriever.filters"

outputs:  # Defines the output of your pipeline
  documents: "meta_field_grouping_ranker.documents"  # The output of the pipeline is the retrieved documents
  answers: "answer_builder.answers"  # The output of the pipeline is the generated answers

max_runs_per_component: 100

metadata: {}

Parameters

Init Parameters

These are the parameters you can configure in Pipeline Builder:

Parameter	Type	Default	Description
azure_endpoint	Optional[str]	None	The endpoint of the deployed model, for example `"https://example-resource.azure.openai.com/"`. Can be set with `AZURE_OPENAI_ENDPOINT` env var.
azure_deployment	str	gpt-5-mini	The deployment of the model, usually the model name.
api_key	Optional[Union[Secret, Callable]]	Secret.from_env_var('AZURE_OPENAI_API_KEY', strict=False)	The API key to use for authentication. Can be a `Secret` object containing the API key, a `Secret` object containing the Azure Active Directory token, or a function that returns an Azure Active Directory token.
organization	Optional[str]	None	Your organization ID, defaults to `None`. For help, see Setting up your organization.
streaming_callback	Optional[StreamingCallbackT]	None	A callback function called when a new token is received from the stream. It accepts StreamingChunk as an argument.
timeout	Optional[float]	None	Timeout for OpenAI client calls. If not set, it defaults to either the `OPENAI_TIMEOUT` environment variable, or 30 seconds.
max_retries	Optional[int]	None	Maximum number of retries to contact OpenAI after an internal error. If not set, it defaults to either the `OPENAI_MAX_RETRIES` environment variable, or set to 5.
generation_kwargs	Optional[Dict[str, Any]]	None	Other parameters to use for the model. These parameters are sent directly to the OpenAI endpoint. For details, see OpenAI documentation. Some of the supported parameters: - `temperature`: What sampling temperature to use. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. - `top_p`: Nucleus sampling is an alternative to sampling with temperature, where the model considers tokens with a top_p probability mass. For example, 0.1 means only the tokens comprising the top 10% probability mass are considered. - `previous_response_id`: The ID of the previous response. Use this to create multi-turn conversations. - `text_format`: A Pydantic model that enforces the structure of the model's response. If provided, the output will always be validated against this format (unless the model returns a tool call). For details, see the OpenAI Structured Outputs documentation. - `text`: A JSON schema that enforces the structure of the model's response. If provided, the output will always be validated against this format (unless the model returns a tool call). Notes: Both JSON Schema and Pydantic models are supported for latest models starting from GPT-4o. If both are provided, `text_format` takes precedence and json schema passed to `text` is ignored. Currently, this component doesn't support streaming for structured outputs. Older models only support basic version of structured outputs through `{"type": "json_object"}`. For detailed information on JSON mode, see the OpenAI Structured Outputs documentation. - `reasoning`: A dictionary of parameters for reasoning. For example: `summary` (The summary of the reasoning), `effort` (The level of effort to put into the reasoning. Can be `low`, `medium` or `high`), `generate_summary` (Whether to generate a summary of the reasoning). Note: OpenAI does not return the reasoning tokens, but we can view summary if its enabled. For details, see the OpenAI Reasoning documentation.
tools	Optional[Union[List[Tool], Toolset, List[dict]]]	None	A list of tools, a Toolset, or OpenAI/MCP tool definitions for which the model can prepare calls. This parameter can accept either a list of `Tool` objects, a `Toolset` instance, or OpenAI/MCP tool definitions as dictionaries. Note: You cannot pass OpenAI/MCP tools and Haystack tools together.
tools_strict	bool	False	Whether to enable strict schema adherence for tool calls. If set to `True`, the model will follow exactly the schema provided in the `parameters` field of the tool definition, but this may increase latency. In Response API, tool calls are strict by default.
http_client_kwargs	Optional[Dict[str, Any]]	None	A dictionary of keyword arguments to configure a custom `httpx.Client`or `httpx.AsyncClient`. For more information, see the HTTPX documentation.

Run Method Parameters

These are the parameters you can configure for the component's run() method. This means you can pass these parameters at query time through the API, in Playground, or when running a job. For details, see Modify Pipeline Parameters at Query Time.

Parameter	Type	Default	Description
messages	List[ChatMessage]		A list of ChatMessage instances representing the input messages.
streaming_callback	Optional[StreamingCallbackT]	None	A callback function that is called when a new token is received from the stream.
generation_kwargs	Optional[Dict[str, Any]]	None	Additional keyword arguments for text generation. These parameters override the parameters in pipeline configuration. For supported parameters, see OpenAI documentation.
tools	Optional[Union[List[Tool], Toolset, List[dict]]]	None	A list of tools, a Toolset, or OpenAI/MCP tool definitions for which the model can prepare calls. If set, it overrides the `tools` parameter in pipeline configuration. Can accept either a list of `Tool` objects, a `Toolset` instance, or OpenAI/MCP tool definitions as dictionaries. Note: You cannot pass OpenAI/MCP tools and Haystack tools together.
tools_strict	Optional[bool]	None	Whether to enable strict schema adherence for tool calls. If set to `True`, the model follows exactly the schema provided in the `parameters` field of the tool definition, but this may increase latency. If set, it overrides the `tools_strict` parameter in pipeline configuration.

Was this page helpful?

Basic Information​

Inputs​

Outputs​

Overview​

Authentication​

Reasoning Support​

Multi-turn Conversations​

Structured Output​

Usage Example​

Parameters​

Init Parameters​

Run Method Parameters​