DeepsetAzureOpenAIVisionGenerator

Generate text using text and image capabilities of OpenAI's LLMs through Azure services.

Basic Information

  • Pipeline type: Query
  • Type: deepset_cloud_custom_nodes.azure_openai_vision.DeepsetAzureOpenAIVisionGenerator
  • Components it can connect with:
    • PromptBuilder: Receives the prompt from PromptBuilder.
    • DeepsetPDFDocumentToBase64Image: Receives images from DeepsetPDFDocumentToBase64Image, which extracts them from PDF files.
    • AnswerBuilder: Sends the generated replies to AnswerBuilder, which uses them to build GeneratedAnswer objects.

Inputs

Required Inputs

NameTypeDescription
promptStringThe prompt with instructions for the model.
imagesList of Base64ImageThe base64-encoded image data. These images are sent to OpenAI to be used as images for text generation.

Optional Inputs

NameTypeDefaultDescription
generation_kwargsDictionary of string and anyNoneAdditional keyword arguments you want to pass to the generator. These parameters override the init parameters. For more details on the parameters you can use, see OpenAI documentation.

Outputs

NameTypeDescription
repliesList of stringsGenerated responses.
metaList of dictionariesMetadata for each response.

Overview

DeepsetAzureOpenAIVisionGenerator works with GPT-4 and GPT-3.5 turbo families of models hosted on Azure. These models can understand images, making it possible to describe them, analyze details, and answer questions based on images. For details and limitations, check OpenAI's Vision documentation.

Authentication

To work with Azure components, you will need an Azure OpenAI API key, as well as an Azure OpenAI endpoint. You can learn more about them in Azure documentation.

Usage Example

Here's an example of a query pipeline with DeepsetAzureOpenAIVisionGenerator. It's preceded by DeepsetFileDownloader and ("image_downloader"), which downloads the documents returned by previous components, such as a Ranker or DocumentJoiner. It then sends the downloaded files to DeepsetPDFDocumentToBase64Image ("pdf_to_image"), which converts them into Base64Image objects that DeepsetAzureOpenAIVisionGenerator can take in. The Generator also receives the prompt from the PromptBuilder. It then sends the generated replies to DeepsetAnswerBuilder.

Here's an example of a query pipeline with DeepsetAzureOpenAIVisionGenerator. It's preceded by [DeepsetFileDownloader](doc:deepsetfiledownloader) and  ("image_downloader"), which downloads the documents returned by previous components, such as a Ranker or DocumentJoiner. It then sends the downloaded files to DeepsetPDFDocumentToBase64Image ("pdf_to_image"), which converts them into Base64Image objects that DeepsetAzureOpenAIVisionGenerator can take in. The Generator also receives the prompt from the PromptBuilder. It then sends the generated replies to DeepsetAnswerBuilder.
Full YAML configuration
components:
  bm25_retriever:
    type: haystack_integrations.components.retrievers.opensearch.bm25_retriever.OpenSearchBM25Retriever
    init_parameters:
      document_store:
        type: haystack_integrations.document_stores.opensearch.document_store.OpenSearchDocumentStore
        init_parameters:
          use_ssl: true
          verify_certs: false
          hosts:
            - ${OPENSEARCH_HOST}
          http_auth:
            - ${OPENSEARCH_USER}
            - ${OPENSEARCH_PASSWORD}
          embedding_dim: 1024
          similarity: cosine
      top_k: 20
  query_embedder:
    type: haystack.components.embedders.sentence_transformers_text_embedder.SentenceTransformersTextEmbedder
    init_parameters:
      model: BAAI/bge-m3
      tokenizer_kwargs:
        model_max_length: 1024
  embedding_retriever:
    type: haystack_integrations.components.retrievers.opensearch.embedding_retriever.OpenSearchEmbeddingRetriever
    init_parameters:
      document_store:
        type: haystack_integrations.document_stores.opensearch.document_store.OpenSearchDocumentStore
        init_parameters:
          use_ssl: true
          verify_certs: false
          hosts:
            - ${OPENSEARCH_HOST}
          http_auth:
            - ${OPENSEARCH_USER}
            - ${OPENSEARCH_PASSWORD}
          embedding_dim: 1024
          similarity: cosine
      top_k: 20
  document_joiner:
    type: haystack.components.joiners.document_joiner.DocumentJoiner
    init_parameters:
      join_mode: concatenate
  ranker:
    type: haystack.components.rankers.transformers_similarity.TransformersSimilarityRanker
    init_parameters:
      model: BAAI/bge-reranker-v2-m3
      top_k: 8
      model_kwargs:
        torch_dtype: torch.float16
      tokenizer_kwargs:
        model_max_length: 1024
      meta_fields_to_embed:
        - file_name
  image_downloader:
    type: deepset_cloud_custom_nodes.augmenters.deepset_file_downloader.DeepsetFileDownloader
    init_parameters:
      file_extensions:
        - .pdf
  pdf_to_image:
    type: deepset_cloud_custom_nodes.converters.pdf_to_image.DeepsetPDFDocumentToBase64Image
    init_parameters:
      detail: high
  prompt_builder:
    type: haystack.components.builders.prompt_builder.PromptBuilder
    init_parameters:
      template: |-
        Answer the questions briefly and precisely using the images and text passages provided.
        Only use images and text passages that are related to the question to answer it.
        In your answer, only refer to images and text passages that are relevant in answering the query.
        Only use references in the form [NUMBER OF IMAGE] if you are using information from an image.
        Or [NUMBER OF DOCUMENT] if you are using information from a document.

        These are the documents:
        {% for document in documents %}
        Document[ {{ loop.index }} ]:
        File Name: {{ document.meta['file_name'] }}
        Text only version of image number {{ loop.index }} that is also provided.
        {{ document.content }}
        {% endfor %}
        Question: {{ question }}
        Answer: 
  answer_builder:
    type: deepset_cloud_custom_nodes.augmenters.deepset_answer_builder.DeepsetAnswerBuilder
    init_parameters:
      reference_pattern: acm
  TopKDocuments:
    type: haystack.components.joiners.document_joiner.DocumentJoiner
    init_parameters:
      top_k: 8
  DeepsetAzureOpenAIVisionGenerator:
    type: deepset_cloud_custom_nodes.generators.azure_openai_vision.DeepsetAzureOpenAIVisionGenerator
    init_parameters:
      azure_endpoint: <endpoint>
      api_version: '2023-05-15'
      azure_deployment: gpt-4o
      generation_kwargs:
        max_tokens: 650
        temperature: 0
        seed: 0
connections:
  - sender: bm25_retriever.documents
    receiver: document_joiner.documents
  - sender: query_embedder.embedding
    receiver: embedding_retriever.query_embedding
  - sender: embedding_retriever.documents
    receiver: document_joiner.documents
  - sender: document_joiner.documents
    receiver: ranker.documents
  - sender: image_downloader.documents
    receiver: pdf_to_image.documents
  - sender: prompt_builder.prompt
    receiver: answer_builder.prompt
  - sender: ranker.documents
    receiver: prompt_builder.documents
  - sender: ranker.documents
    receiver: TopKDocuments.documents
  - sender: TopKDocuments.documents
    receiver: image_downloader.documents
  - sender: ranker.documents
    receiver: answer_builder.documents
  - sender: prompt_builder.prompt
    receiver: DeepsetAzureOpenAIVisionGenerator.prompt
  - sender: pdf_to_image.base64_images
    receiver: DeepsetAzureOpenAIVisionGenerator.images
  - sender: DeepsetAzureOpenAIVisionGenerator.replies
    receiver: answer_builder.replies
max_loops_allowed: 100
metadata: {}
inputs:
  query:
    - bm25_retriever.query
    - query_embedder.text
    - ranker.query
    - prompt_builder.question
    - answer_builder.query
  filters:
    - embedding_retriever.filters
    - bm25_retriever.filters
outputs:
  answers: answer_builder.answers
  documents: ranker.documents

Init Parameters

ParameterTypePossible valuesDescription
azure_endpointStringDefault: NoneThe endpoint of the deployed model, for example https://example-resource.azure.openai.com/. Optional.
api_versionStringDefault: 2023-05-15The version of the API to use. Optional.
azure_deploymentStringDefault: gpt-4oThe deployment of the model, usually the model name. Optional.
api_keySecretDefault: Secret.from_env_var("AZURE_OPENAI_API_KEY", strict=False)The API key to use for authentication. By default, loaded from the environment variable AZURE_OPENAI_API_KEY.
Optional.
azure_ad_tokenSecretDefault: Secret.from_env_var("AZURE_OPENAI_AD_TOKEN", strict=False)Azure Active Directory token. By default, loaded from the environment variable AZURE_OPENAI_AD_TOKEN.
Optional.
organizationStringDefault: NoneYour organization ID. Read more about organization setup in OpenAI documentation.
Optional.
streaming_callbackStreamingChunkDefault: NoneA callback function called when a new token is received from the stream. Accepts StreamingChunk as an argument.
Optional.
system_promptStringDefault: NoneThe system prompt for text generation. If not provided, the default system prompt is used.
Optional.
timeoutFloatDefault: 30 (inferred from OPENAI_TIMEOUT env variable)Timeout for the AzureOpenAI client. Defaults to 30 if not set.
Optional.
max_retriesIntegerDefault: 5 (inferred from OPENAI_MAX_RETRIES env variable)Maximum retries for AzureOpenAI if an internal error occurs. Defaults to 5 if not set.
Optional.
generation_kwargsDictionary with string keys and any type as valuesDefault: NoneOptional dictionary of additional parameters for model generation. Parameters include:- max_tokens: Maximum tokens in output.

- temperature: Sampling temperature for creativity.
- top_p: Nucleus sampling probability.
- n: Number of completions per prompt.
- stop: Sequences where generation stops.
- presence_penalty and frequency_penalty: Penalties to discourage repetition.
- logit_bias: Logit bias per token.
Optional.
default_headersDictionary of string keys and string valuesDefault: NoneDefault headers for the AzureOpenAI client.
Optional.