DeepsetOpenAIVisionGenerator

Generate text using text and image capabilities of OpenAI's LLMs through Azure services.

Deprecation Notice

This component is deprecated. It will continue to work in your existing pipelines for now. You can replace it with the OpenAIChatGenerator component.

Basic Information

Type: deepset_cloud_custom_nodes.azure_openai_vision.DeepsetAzureOpenAIVisionGenerator
Components it can connect with:
- PromptBuilder: Receives the prompt from PromptBuilder.
- DeepsetPDFDocumentToBase64Image: Receives images from DeepsetPDFDocumentToBase64Image, which extracts them from PDF files.
- DeepsetAnswerBuilder: Sends the generated replies to DeepsetAnswerBuilder, which uses them to build GeneratedAnswer objects.

Inputs

Parameter	Type	Default	Description
prompt	str		The prompt with instructions for the model.
images	List[Base64Image]		A list of Base64Images that represent the image content of the message. The base64 encoded images are passed on to OpenAI to be used as images for the text generation.
generation_kwargs	Optional[Dict[str, Any]]	None	Additional keyword arguments for text generation. These parameters potentially override the parameters in pipeline configuration. For more details on the parameters supported by the OpenAI API, refer to the OpenAI documentation.
streaming_callback	Optional[Callable[[StreamingChunk], None]]	None	A callback function called when a new token is received from the stream. For more information, see Enable Streaming.

Outputs

Parameter	Type	Default	Description
replies	List[str]		A list of strings containing the generated responses.
meta	List[Dict[str, Any]]		A list of dictionaries containing the metadata for each response.

Overview

DeepsetOpenAIVisionGenerator works with the GPT families of models hosted on Azure. These models can understand images, making it possible to describe them, analyze details, and answer questions based on images. For details and limitations, check OpenAI's Vision documentation.

Authentication

To work with OpenAI models, you need an OpenAI API key.

Once you have the API key, connect Haystack with OpenAI:

Add Workspace-Level Integration

Click your profile icon and choose Settings.
Go to Workspace>Integrations.
Find the provider you want to connect and click Connect next to them.
Enter the API key and any other required details.
Click Connect. You can use this integration in pipelines and indexes in the current workspace.

Add Organization-Level Integration

Click your profile icon and choose Settings.
Go to Organization>Integrations.
Find the provider you want to connect and click Connect next to them.
Enter the API key and any other required details.
Click Connect. You can use this integration in pipelines and indexes in all workspaces in the current organization.

Usage Example

Using the Component in a Pipeline

Here's an example of a query pipeline with DeepsetOpenAIVisionGenerator. It's preceded by DeepsetFileDownloader ("image_downloader"), which downloads the documents returned by previous components, such as a Ranker or DocumentJoiner. It then sends the downloaded files to DeepsetPDFDocumentToBase64Image ("pdf_to_image"), which converts them into Base64Image objects that DeepsetOpenAIVisionGenerator can take in. The Generator also receives the prompt from the PromptBuilder. It then sends the generated replies to DeepsetAnswerBuilder.

The generator in a pipeline in Pipeline Builder

And here's the YAML configuration:

components:
  bm25_retriever:
    type: haystack_integrations.components.retrievers.opensearch.bm25_retriever.OpenSearchBM25Retriever
    init_parameters:
      document_store:
        type: haystack_integrations.document_stores.opensearch.document_store.OpenSearchDocumentStore
        init_parameters:
          use_ssl: true
          verify_certs: false
          hosts:
          - ${OPENSEARCH_HOST}
          http_auth:
          - ${OPENSEARCH_USER}
          - ${OPENSEARCH_PASSWORD}
          embedding_dim: 1024
          similarity: cosine
          index: ''
          max_chunk_bytes: 104857600
          return_embedding: false
          method:
          mappings:
          settings:
          create_index: true
          timeout:
      top_k: 20
  query_embedder:
    type: haystack.components.embedders.sentence_transformers_text_embedder.SentenceTransformersTextEmbedder
    init_parameters:
      model: BAAI/bge-m3
      tokenizer_kwargs:
        model_max_length: 1024
  embedding_retriever:
    type: haystack_integrations.components.retrievers.opensearch.embedding_retriever.OpenSearchEmbeddingRetriever
    init_parameters:
      document_store:
        type: haystack_integrations.document_stores.opensearch.document_store.OpenSearchDocumentStore
        init_parameters:
          use_ssl: true
          verify_certs: false
          hosts:
          - ${OPENSEARCH_HOST}
          http_auth:
          - ${OPENSEARCH_USER}
          - ${OPENSEARCH_PASSWORD}
          embedding_dim: 1024
          similarity: cosine
          index: ''
          max_chunk_bytes: 104857600
          return_embedding: false
          method:
          mappings:
          settings:
          create_index: true
          timeout:
      top_k: 20
  document_joiner:
    type: haystack.components.joiners.document_joiner.DocumentJoiner
    init_parameters:
      join_mode: concatenate
  ranker:
    type: haystack.components.rankers.transformers_similarity.TransformersSimilarityRanker
    init_parameters:
      model: BAAI/bge-reranker-v2-m3
      top_k: 8
      model_kwargs:
        torch_dtype: torch.float16
      tokenizer_kwargs:
        model_max_length: 1024
      meta_fields_to_embed:
      - file_name
  image_downloader:
    type: deepset_cloud_custom_nodes.augmenters.deepset_file_downloader.DeepsetFileDownloader
    init_parameters:
      file_extensions:
      - .pdf
  pdf_to_image:
    type: deepset_cloud_custom_nodes.converters.pdf_to_image.DeepsetPDFDocumentToBase64Image
    init_parameters:
      detail: high
  prompt_builder:
    type: haystack.components.builders.prompt_builder.PromptBuilder
    init_parameters:
      template: |-
        Answer the questions briefly and precisely using the images and text passages provided.
        Only use images and text passages that are related to the question to answer it.
        In your answer, only refer to images and text passages that are relevant in answering the query.
        Only use references in the form [NUMBER OF IMAGE] if you are using information from an image.
        Or [NUMBER OF DOCUMENT] if you are using information from a document.

        These are the documents:
        {% for document in documents %}
        Document[ {{ loop.index }} ]:
        File Name: {{ document.meta['file_name'] }}
        Text only version of image number {{ loop.index }} that is also provided.
        {{ document.content }}
        {% endfor %}
        Question: {{ question }}
        Answer: 
  answer_builder:
    type: deepset_cloud_custom_nodes.augmenters.deepset_answer_builder.DeepsetAnswerBuilder
    init_parameters:
      reference_pattern: acm
  TopKDocuments:
    type: haystack.components.joiners.document_joiner.DocumentJoiner
    init_parameters:
      top_k: 8
  DeepsetOpenAIVisionGenerator:
    type: deepset_cloud_custom_nodes.generators.openai_vision.DeepsetOpenAIVisionGenerator
    init_parameters:
      api_key:
        type: env_var
        env_vars:
        - OPENAI_API_KEY
        strict: false
      model: gpt-4o
      streaming_callback:
      api_base_url:
      organization:
      system_prompt:
      generation_kwargs:
      timeout:
      max_retries:

connections:
- sender: bm25_retriever.documents
  receiver: document_joiner.documents
- sender: query_embedder.embedding
  receiver: embedding_retriever.query_embedding
- sender: embedding_retriever.documents
  receiver: document_joiner.documents
- sender: document_joiner.documents
  receiver: ranker.documents
- sender: image_downloader.documents
  receiver: pdf_to_image.documents
- sender: prompt_builder.prompt
  receiver: answer_builder.prompt
- sender: ranker.documents
  receiver: prompt_builder.documents
- sender: ranker.documents
  receiver: TopKDocuments.documents
- sender: TopKDocuments.documents
  receiver: image_downloader.documents
- sender: ranker.documents
  receiver: answer_builder.documents
- sender: DeepsetOpenAIVisionGenerator.replies
  receiver: answer_builder.replies
- sender: prompt_builder.prompt
  receiver: DeepsetOpenAIVisionGenerator.prompt
- sender: pdf_to_image.base64_images
  receiver: DeepsetOpenAIVisionGenerator.images

metadata: {}

inputs:
  query:
  - bm25_retriever.query
  - query_embedder.text
  - ranker.query
  - prompt_builder.question
  - answer_builder.query
  filters:
  - embedding_retriever.filters
  - bm25_retriever.filters

outputs:
  answers: answer_builder.answers
  documents: ranker.documents

max_runs_per_component: 100

Parameters

Init Parameters

These are the parameters you can configure in Pipeline Builder:

Parameter	Type	Default	Description
api_key	Secret	Secret.from_env_var('OPENAI_API_KEY')	The OpenAI API key.
model	str	gpt-4o	The name of the model to use. By default, it uses the `gpt-4-vision-preview` model.
streaming_callback	Optional[Callable[[StreamingChunk], None]]	None	A callback function called when a new token is received from the stream. The callback function accepts `StreamingChunk` as an argument. For details, see Enable Streaming.
api_base_url	Optional[str]	None	An optional base URL.
organization	Optional[str]	None	The Organization ID. See production best practices.
system_prompt	Optional[str]	None	The system prompt to use for text generation. If not provided, the system prompt is omitted, and the default system prompt of the model is used.
generation_kwargs	Optional[Dict[str, Any]]	None	Other parameters to use for the model. These parameters are all sent directly to the OpenAI endpoint. See OpenAI documentation for more details. Some supported parameters: - `max_tokens`: The maximum number of tokens the output text can have. - `temperature`: What sampling temperature to use. Higher values mean the model will take more risks. Try 0.9 for more creative applications and 0 (argmax sampling) for ones with a well-defined answer. - `top_p`: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So, 0.1 means only the tokens comprising the top 10% probability mass are considered. - `n`: How many completions to generate for each prompt. For example, if the LLM gets 3 prompts and n is 2, it will generate two completions for each of the three prompts, ending up with 6 completions in total. - `stop`: One or more sequences after which the LLM should stop generating tokens. - `presence_penalty`: What penalty to apply if a token is already present at all. Bigger values mean the model will be less likely to repeat the same token in the text. - `frequency_penalty`: What penalty to apply if a token has already been generated in the text. Bigger values mean the model will be less likely to repeat the same token in the text. - `logit_bias`: Add a logit bias to specific tokens. The keys of the dictionary are tokens, and the values are the bias to add to that token.
timeout	Optional[float]	None	Timeout for OpenAI Client calls, if not set it is inferred from the `OPENAI_TIMEOUT` environment variable or set to 30.
max_retries	Optional[int]	None	Maximum retries to establish contact with OpenAI if it returns an internal error, if not set it is inferred from the `OPENAI_MAX_RETRIES` environment variable or set to 5.

Run Method Parameters

These are the parameters you can configure for the component's run() method. This means you can pass these parameters at query time through the API, in Playground, or when running a job. For details, see Modify Pipeline Parameters at Query Time.

Parameter	Type	Default	Description
prompt	str		The prompt with instructions for the model.
images	List[Base64Image]		A list of Base64Images that represent the image content of the message. The base64 encoded images are passed on to OpenAI to be used as images for the text generation.
generation_kwargs	Optional[Dict[str, Any]]	None	Additional keyword arguments for text generation. These parameters potentially override the parameters in pipeline configuration. For more details on the parameters supported by the OpenAI API, refer to the OpenAI documentation.
streaming_callback	Optional[Callable[[StreamingChunk], None]]	None	A callback function called when a new token is received from the stream. For more information, see Enable Streaming.

Was this page helpful?

Basic Information​

Inputs​

Outputs​

Overview​

Authentication​

Add Workspace-Level Integration​

Add Organization-Level Integration​

Usage Example​

Using the Component in a Pipeline​

Parameters​

Init Parameters​

Run Method Parameters​