Basic Information

Type: deepset_cloud_custom_nodes.azure_openai_vision.DeepsetAzureOpenAIVisionGenerator
Components it can connect with:
- PromptBuilder: Receives the prompt from PromptBuilder.
- DeepsetPDFDocumentToBase64Image: Receives images from DeepsetPDFDocumentToBase64Image, which extracts them from PDF files.
- DeepsetAnswerBuilder: Sends the generated replies to DeepsetAnswerBuilder, which uses them to build GeneratedAnswer objects.

Inputs

Required Inputs

Name	Type	Description
`prompt`	String	The prompt with instructions for the model.
`images`	List of Base64Image	The base64-encoded image data. These images are sent to OpenAI to be used as images for text generation.

Optional Inputs

Name	Type	Default	Description
`generation_kwargs`	Dictionary of string and any	`None`	Additional keyword arguments you want to pass to the generator. These parameters override the init parameters. For more details on the parameters you can use, see OpenAI documentation.

Outputs

Name	Type	Description
`replies`	List of strings	Generated responses.
`meta`	List of dictionaries	Metadata for each response.

Overview

DeepsetAzureOpenAIVisionGenerator works with GPT-4 and GPT-3.5 turbo families of models hosted on Azure. These models can understand images, making it possible to describe them, analyze details, and answer questions based on images. For details and limitations, check OpenAI's Vision documentation.

Authentication

To work with Azure components, you will need an Azure OpenAI API key, as well as an Azure OpenAI endpoint. You can learn more about them in Azure documentation.

Usage Example

Here's an example of a query pipeline with DeepsetAzureOpenAIVisionGenerator. It's preceded by DeepsetFileDownloader and ("image_downloader"), which downloads the documents returned by previous components, such as a Ranker or DocumentJoiner. It then sends the downloaded files to DeepsetPDFDocumentToBase64Image ("pdf_to_image"), which converts them into Base64Image objects that DeepsetAzureOpenAIVisionGenerator can take in. The Generator also receives the prompt from the PromptBuilder. It then sends the generated replies to DeepsetAnswerBuilder.

Full YAML configuration

components:
  bm25_retriever:
    type: haystack_integrations.components.retrievers.opensearch.bm25_retriever.OpenSearchBM25Retriever
    init_parameters:
      document_store:
        type: haystack_integrations.document_stores.opensearch.document_store.OpenSearchDocumentStore
        init_parameters:
          use_ssl: true
          verify_certs: false
          hosts:
            - ${OPENSEARCH_HOST}
          http_auth:
            - ${OPENSEARCH_USER}
            - ${OPENSEARCH_PASSWORD}
          embedding_dim: 1024
          similarity: cosine
      top_k: 20
  query_embedder:
    type: haystack.components.embedders.sentence_transformers_text_embedder.SentenceTransformersTextEmbedder
    init_parameters:
      model: BAAI/bge-m3
      tokenizer_kwargs:
        model_max_length: 1024
  embedding_retriever:
    type: haystack_integrations.components.retrievers.opensearch.embedding_retriever.OpenSearchEmbeddingRetriever
    init_parameters:
      document_store:
        type: haystack_integrations.document_stores.opensearch.document_store.OpenSearchDocumentStore
        init_parameters:
          use_ssl: true
          verify_certs: false
          hosts:
            - ${OPENSEARCH_HOST}
          http_auth:
            - ${OPENSEARCH_USER}
            - ${OPENSEARCH_PASSWORD}
          embedding_dim: 1024
          similarity: cosine
      top_k: 20
  document_joiner:
    type: haystack.components.joiners.document_joiner.DocumentJoiner
    init_parameters:
      join_mode: concatenate
  ranker:
    type: haystack.components.rankers.transformers_similarity.TransformersSimilarityRanker
    init_parameters:
      model: BAAI/bge-reranker-v2-m3
      top_k: 8
      model_kwargs:
        torch_dtype: torch.float16
      tokenizer_kwargs:
        model_max_length: 1024
      meta_fields_to_embed:
        - file_name
  image_downloader:
    type: deepset_cloud_custom_nodes.augmenters.deepset_file_downloader.DeepsetFileDownloader
    init_parameters:
      file_extensions:
        - .pdf
  pdf_to_image:
    type: deepset_cloud_custom_nodes.converters.pdf_to_image.DeepsetPDFDocumentToBase64Image
    init_parameters:
      detail: high
  prompt_builder:
    type: haystack.components.builders.prompt_builder.PromptBuilder
    init_parameters:
      template: |-
        Answer the questions briefly and precisely using the images and text passages provided.
        Only use images and text passages that are related to the question to answer it.
        In your answer, only refer to images and text passages that are relevant in answering the query.
        Only use references in the form [NUMBER OF IMAGE] if you are using information from an image.
        Or [NUMBER OF DOCUMENT] if you are using information from a document.

        These are the documents:
        {% for document in documents %}
        Document[ {{ loop.index }} ]:
        File Name: {{ document.meta['file_name'] }}
        Text only version of image number {{ loop.index }} that is also provided.
        {{ document.content }}
        {% endfor %}
        Question: {{ question }}
        Answer: 
  answer_builder:
    type: deepset_cloud_custom_nodes.augmenters.deepset_answer_builder.DeepsetAnswerBuilder
    init_parameters:
      reference_pattern: acm
  TopKDocuments:
    type: haystack.components.joiners.document_joiner.DocumentJoiner
    init_parameters:
      top_k: 8
  DeepsetAzureOpenAIVisionGenerator:
    type: deepset_cloud_custom_nodes.generators.azure_openai_vision.DeepsetAzureOpenAIVisionGenerator
    init_parameters:
      azure_endpoint: <endpoint>
      api_version: '2023-05-15'
      azure_deployment: gpt-4o
      generation_kwargs:
        max_tokens: 650
        temperature: 0
        seed: 0
connections:
  - sender: bm25_retriever.documents
    receiver: document_joiner.documents
  - sender: query_embedder.embedding
    receiver: embedding_retriever.query_embedding
  - sender: embedding_retriever.documents
    receiver: document_joiner.documents
  - sender: document_joiner.documents
    receiver: ranker.documents
  - sender: image_downloader.documents
    receiver: pdf_to_image.documents
  - sender: prompt_builder.prompt
    receiver: answer_builder.prompt
  - sender: ranker.documents
    receiver: prompt_builder.documents
  - sender: ranker.documents
    receiver: TopKDocuments.documents
  - sender: TopKDocuments.documents
    receiver: image_downloader.documents
  - sender: ranker.documents
    receiver: answer_builder.documents
  - sender: prompt_builder.prompt
    receiver: DeepsetAzureOpenAIVisionGenerator.prompt
  - sender: pdf_to_image.base64_images
    receiver: DeepsetAzureOpenAIVisionGenerator.images
  - sender: DeepsetAzureOpenAIVisionGenerator.replies
    receiver: answer_builder.replies
max_loops_allowed: 100
metadata: {}
inputs:
  query:
    - bm25_retriever.query
    - query_embedder.text
    - ranker.query
    - prompt_builder.question
    - answer_builder.query
  filters:
    - embedding_retriever.filters
    - bm25_retriever.filters
outputs:
  answers: answer_builder.answers
  documents: ranker.documents

Init Parameters

Parameter	Type	Possible values	Description
`azure_endpoint`	String	Default: `None`	The endpoint of the deployed model, for example `https://example-resource.azure.openai.com/`. Optional.
`api_version`	String	Default: `2023-05-15`	The version of the API to use. Optional.
`azure_deployment`	String	Default: `gpt-4o`	The deployment of the model, usually the model name. Optional.
`api_key`	Secret	Default: `Secret.from_env_var("AZURE_OPENAI_API_KEY", strict=False)`	The API key to use for authentication. By default, loaded from the environment variable `AZURE_OPENAI_API_KEY`. Optional.
`azure_ad_token`	Secret	Default: `Secret.from_env_var("AZURE_OPENAI_AD_TOKEN", strict=False)`	Azure Active Directory token. By default, loaded from the environment variable `AZURE_OPENAI_AD_TOKEN`. Optional.
`organization`	String	Default: `None`	Your organization ID. Read more about organization setup in OpenAI documentation. Optional.
`streaming_callback`	StreamingChunk	Default: `None`	A callback function called when a new token is received from the stream. Accepts StreamingChunk as an argument. This parameter specifies if the generator should stream. To make it stream, set `streaming_callback` to `deepset_cloud_custom_nodes.callbacks.streaming.streaming_callback` Optional.
`system_prompt`	String	Default: `None`	The system prompt for text generation. If not provided, the default system prompt is used. Optional.
`timeout`	Float	Default: `30` (inferred from `OPENAI_TIMEOUT` env variable)	Timeout for the AzureOpenAI client. Defaults to 30 if not set. Optional.
`max_retries`	Integer	Default: `5` (inferred from `OPENAI_MAX_RETRIES` env variable)	Maximum retries for AzureOpenAI if an internal error occurs. Defaults to 5 if not set. Optional.
`generation_kwargs`	Dictionary with string keys and any type as values	Default: `None`	Optional dictionary of additional parameters for model generation. Parameters include:- `max_tokens`: Maximum tokens in output. - `temperature`: Sampling temperature for creativity. - `top_p`: Nucleus sampling probability. - `n`: Number of completions per prompt. - `stop`: Sequences where generation stops. - `presence_penalty` and `frequency_penalty`: Penalties to discourage repetition. - `logit_bias`: Logit bias per token. Optional.
`default_headers`	Dictionary of string keys and string values	Default: `None`	Default headers for the AzureOpenAI client. Optional.