Skip to main content
For the complete documentation index for agents and LLMs, see llms.txt.

VertexAIImageCaptioner

Generate captions for images using Google Vertex AI imagetext model.

Key Features

  • Generates descriptive captions for images using the Vertex AI imagetext model
  • Accepts images as ByteStream input
  • Authenticates using Google Cloud Application Default Credentials (ADCs)

Configuration

  1. Drag the VertexAIImageCaptioner component onto the canvas from the Component Library.
  2. Click on the component to open the configuration panel.
  3. On the General tab:
    1. Enter your GCP project ID. Create a secret with the key GCP_PROJECT_ID. For detailed instructions, see Create Secrets.
    2. Optionally, enter the location. If not set, uses us-central1.
    3. The default model is imagetext.
  4. Go to the Advanced tab to configure additional model keyword arguments.

Connections

VertexAIImageCaptioner accepts an image as a ByteStream object through its image input. It outputs generated captions as captions (a list of strings).

Connect an image source to the image input. Connect the captions output to AnswerBuilder.

Source Code

To check this component's source code, open captioner.py in the Haystack Core Integrations repository.

Usage Examples

Basic Configuration

  VertexAIImageCaptioner:
type: haystack_integrations.components.generators.google_vertex.captioner.VertexAIImageCaptioner
init_parameters:
model: imagetext

This pipeline uses VertexAIImageCaptioner to generate captions for images:

components:
VertexAIImageCaptioner:
type: haystack_integrations.components.generators.google_vertex.captioner.VertexAIImageCaptioner
init_parameters:
project_id:
model: imagetext
location:

AnswerBuilder:
type: haystack.components.builders.answer_builder.AnswerBuilder
init_parameters:
pattern:
reference_pattern:

connections:
- sender: VertexAIImageCaptioner.captions
receiver: AnswerBuilder.replies

inputs:
image:
- VertexAIImageCaptioner.image
query:
- AnswerBuilder.query

outputs:
answers: AnswerBuilder.answers

max_runs_per_component: 100

metadata: {}

Parameters

Inputs

ParameterTypeDescription
imageByteStreamThe image to generate captions for.

Outputs

ParameterTypeDescription
captionsList[str]A list of captions generated by the model.

Init Parameters

These are the parameters you can configure in Pipeline Builder:

ParameterTypeDefaultDescription
project_idOptional[str]NoneID of the GCP project to use. By default, it is set during Google Cloud authentication.
modelstrimagetextName of the model to use.
locationOptional[str]NoneThe default location to use when making API calls. If not set, uses us-central-1.
kwargsAnyAdditional keyword arguments to pass to the model. See the ImageTextModel.get_captions() documentation.

Run Method Parameters

These are the parameters you can configure for the component's run() method. This means you can pass these parameters at query time through the API, in Playground, or when running a job. For details, see Modify Pipeline Parameters at Query Time.

ParameterTypeDescription
imageByteStreamThe image to generate captions for.