DeepsetAzureOpenAIVisionGenerator
Generate text using text and image capabilities of OpenAI's LLMs through Azure services.
Basic Information
- Pipeline type: Query
- Type:
deepset_cloud_custom_nodes.azure_openai_vision.DeepsetAzureOpenAIVisionGenerator
- Components it can connect with:
- PromptBuilder: Receives the prompt from PromptBuilder.
- DeepsetPDFDocumentToBase64Image: Receives images from DeepsetPDFDocumentToBase64Image, which extracts them from PDF files.
- AnswerBuilder: Sends the generated replies to AnswerBuilder, which uses them to build GeneratedAnswer objects.
Inputs
Required Inputs
Name | Type | Description |
---|---|---|
prompt | String | The prompt with instructions for the model. |
images | List of Base64Image | The base64-encoded image data. These images are sent to OpenAI to be used as images for text generation. |
Optional Inputs
Name | Type | Default | Description |
---|---|---|---|
generation_kwargs | Dictionary of string and any | None | Additional keyword arguments you want to pass to the generator. These parameters override the init parameters. For more details on the parameters you can use, see OpenAI documentation. |
Outputs
Name | Type | Description |
---|---|---|
replies | List of strings | Generated responses. |
meta | List of dictionaries | Metadata for each response. |
Overview
DeepsetAzureOpenAIVisionGenerator works with GPT-4 and GPT-3.5 turbo families of models hosted on Azure. These models can understand images, making it possible to describe them, analyze details, and answer questions based on images. For details and limitations, check OpenAI's Vision documentation.
Authentication
To work with Azure components, you will need an Azure OpenAI API key, as well as an Azure OpenAI endpoint. You can learn more about them in Azure documentation.
Usage Example
Here's an example of a query pipeline with DeepsetAzureOpenAIVisionGenerator. It's preceded by DeepsetFileDownloader and ("image_downloader"), which downloads the documents returned by previous components, such as a Ranker or DocumentJoiner. It then sends the downloaded files to DeepsetPDFDocumentToBase64Image ("pdf_to_image"), which converts them into Base64Image objects that DeepsetAzureOpenAIVisionGenerator can take in. The Generator also receives the prompt from the PromptBuilder. It then sends the generated replies to DeepsetAnswerBuilder.
Full YAML configuration
components:
bm25_retriever:
type: haystack_integrations.components.retrievers.opensearch.bm25_retriever.OpenSearchBM25Retriever
init_parameters:
document_store:
type: haystack_integrations.document_stores.opensearch.document_store.OpenSearchDocumentStore
init_parameters:
use_ssl: true
verify_certs: false
hosts:
- ${OPENSEARCH_HOST}
http_auth:
- ${OPENSEARCH_USER}
- ${OPENSEARCH_PASSWORD}
embedding_dim: 1024
similarity: cosine
top_k: 20
query_embedder:
type: haystack.components.embedders.sentence_transformers_text_embedder.SentenceTransformersTextEmbedder
init_parameters:
model: BAAI/bge-m3
tokenizer_kwargs:
model_max_length: 1024
embedding_retriever:
type: haystack_integrations.components.retrievers.opensearch.embedding_retriever.OpenSearchEmbeddingRetriever
init_parameters:
document_store:
type: haystack_integrations.document_stores.opensearch.document_store.OpenSearchDocumentStore
init_parameters:
use_ssl: true
verify_certs: false
hosts:
- ${OPENSEARCH_HOST}
http_auth:
- ${OPENSEARCH_USER}
- ${OPENSEARCH_PASSWORD}
embedding_dim: 1024
similarity: cosine
top_k: 20
document_joiner:
type: haystack.components.joiners.document_joiner.DocumentJoiner
init_parameters:
join_mode: concatenate
ranker:
type: haystack.components.rankers.transformers_similarity.TransformersSimilarityRanker
init_parameters:
model: BAAI/bge-reranker-v2-m3
top_k: 8
model_kwargs:
torch_dtype: torch.float16
tokenizer_kwargs:
model_max_length: 1024
meta_fields_to_embed:
- file_name
image_downloader:
type: deepset_cloud_custom_nodes.augmenters.deepset_file_downloader.DeepsetFileDownloader
init_parameters:
file_extensions:
- .pdf
pdf_to_image:
type: deepset_cloud_custom_nodes.converters.pdf_to_image.DeepsetPDFDocumentToBase64Image
init_parameters:
detail: high
prompt_builder:
type: haystack.components.builders.prompt_builder.PromptBuilder
init_parameters:
template: |-
Answer the questions briefly and precisely using the images and text passages provided.
Only use images and text passages that are related to the question to answer it.
In your answer, only refer to images and text passages that are relevant in answering the query.
Only use references in the form [NUMBER OF IMAGE] if you are using information from an image.
Or [NUMBER OF DOCUMENT] if you are using information from a document.
These are the documents:
{% for document in documents %}
Document[ {{ loop.index }} ]:
File Name: {{ document.meta['file_name'] }}
Text only version of image number {{ loop.index }} that is also provided.
{{ document.content }}
{% endfor %}
Question: {{ question }}
Answer:
answer_builder:
type: deepset_cloud_custom_nodes.augmenters.deepset_answer_builder.DeepsetAnswerBuilder
init_parameters:
reference_pattern: acm
TopKDocuments:
type: haystack.components.joiners.document_joiner.DocumentJoiner
init_parameters:
top_k: 8
DeepsetAzureOpenAIVisionGenerator:
type: deepset_cloud_custom_nodes.generators.azure_openai_vision.DeepsetAzureOpenAIVisionGenerator
init_parameters:
azure_endpoint: <endpoint>
api_version: '2023-05-15'
azure_deployment: gpt-4o
generation_kwargs:
max_tokens: 650
temperature: 0
seed: 0
connections:
- sender: bm25_retriever.documents
receiver: document_joiner.documents
- sender: query_embedder.embedding
receiver: embedding_retriever.query_embedding
- sender: embedding_retriever.documents
receiver: document_joiner.documents
- sender: document_joiner.documents
receiver: ranker.documents
- sender: image_downloader.documents
receiver: pdf_to_image.documents
- sender: prompt_builder.prompt
receiver: answer_builder.prompt
- sender: ranker.documents
receiver: prompt_builder.documents
- sender: ranker.documents
receiver: TopKDocuments.documents
- sender: TopKDocuments.documents
receiver: image_downloader.documents
- sender: ranker.documents
receiver: answer_builder.documents
- sender: prompt_builder.prompt
receiver: DeepsetAzureOpenAIVisionGenerator.prompt
- sender: pdf_to_image.base64_images
receiver: DeepsetAzureOpenAIVisionGenerator.images
- sender: DeepsetAzureOpenAIVisionGenerator.replies
receiver: answer_builder.replies
max_loops_allowed: 100
metadata: {}
inputs:
query:
- bm25_retriever.query
- query_embedder.text
- ranker.query
- prompt_builder.question
- answer_builder.query
filters:
- embedding_retriever.filters
- bm25_retriever.filters
outputs:
answers: answer_builder.answers
documents: ranker.documents
Init Parameters
Parameter | Type | Possible values | Description |
---|---|---|---|
azure_endpoint | String | Default: None | The endpoint of the deployed model, for example https://example-resource.azure.openai.com/ . Optional. |
api_version | String | Default: 2023-05-15 | The version of the API to use. Optional. |
azure_deployment | String | Default: gpt-4o | The deployment of the model, usually the model name. Optional. |
api_key | Secret | Default: Secret.from_env_var("AZURE_OPENAI_API_KEY", strict=False) | The API key to use for authentication. By default, loaded from the environment variable AZURE_OPENAI_API_KEY .Optional. |
azure_ad_token | Secret | Default: Secret.from_env_var("AZURE_OPENAI_AD_TOKEN", strict=False) | Azure Active Directory token. By default, loaded from the environment variable AZURE_OPENAI_AD_TOKEN .Optional. |
organization | String | Default: None | Your organization ID. Read more about organization setup in OpenAI documentation. Optional. |
streaming_callback | StreamingChunk | Default: None | A callback function called when a new token is received from the stream. Accepts StreamingChunk as an argument. Optional. |
system_prompt | String | Default: None | The system prompt for text generation. If not provided, the default system prompt is used. Optional. |
timeout | Float | Default: 30 (inferred from OPENAI_TIMEOUT env variable) | Timeout for the AzureOpenAI client. Defaults to 30 if not set. Optional. |
max_retries | Integer | Default: 5 (inferred from OPENAI_MAX_RETRIES env variable) | Maximum retries for AzureOpenAI if an internal error occurs. Defaults to 5 if not set. Optional. |
generation_kwargs | Dictionary with string keys and any type as values | Default: None | Optional dictionary of additional parameters for model generation. Parameters include:- max_tokens : Maximum tokens in output.- temperature : Sampling temperature for creativity.- top_p : Nucleus sampling probability.- n : Number of completions per prompt.- stop : Sequences where generation stops.- presence_penalty and frequency_penalty : Penalties to discourage repetition.- logit_bias : Logit bias per token.Optional. |
default_headers | Dictionary of string keys and string values | Default: None | Default headers for the AzureOpenAI client. Optional. |
Updated 3 months ago