Skip to main content

GoogleGenAIDocumentEmbedder

Computes document embeddings using Google AI models.

Basic Information

  • Type: haystack_integrations.components.embedders.google_genai.document_embedder.GoogleGenAIDocumentEmbedder

Inputs

ParameterTypeDefaultDescription
documentsList[Document]A list of documents to embed.

Outputs

ParameterTypeDefaultDescription
documentsList[Document]A dictionary with the following keys: - documents: A list of documents with embeddings. - meta: Information about the usage of the model.
metaDict[str, Any]A dictionary with the following keys: - documents: A list of documents with embeddings. - meta: Information about the usage of the model.

Overview

Work in Progress

Bear with us while we're working on adding pipeline examples and most common components connections.

Computes document embeddings using Google AI models.

Authentication examples

1. Gemini Developer API (API Key Authentication)

from haystack_integrations.components.embedders.google_genai import GoogleGenAIDocumentEmbedder

# export the environment variable (GOOGLE_API_KEY or GEMINI_API_KEY)
document_embedder = GoogleGenAIDocumentEmbedder(model="text-embedding-004")

**2. Vertex AI (Application Default Credentials)**
```python
from haystack_integrations.components.embedders.google_genai import GoogleGenAIDocumentEmbedder

# Using Application Default Credentials (requires gcloud auth setup)
document_embedder = GoogleGenAIDocumentEmbedder(
api="vertex",
vertex_ai_project="my-project",
vertex_ai_location="us-central1",
model="text-embedding-004"
)

3. Vertex AI (API Key Authentication)

from haystack_integrations.components.embedders.google_genai import GoogleGenAIDocumentEmbedder

# export the environment variable (GOOGLE_API_KEY or GEMINI_API_KEY)
document_embedder = GoogleGenAIDocumentEmbedder(
api="vertex",
model="text-embedding-004"
)

Usage Example

components:
GoogleGenAIDocumentEmbedder:
type: integrations.google_genai.src.haystack_integrations.components.embedders.google_genai.document_embedder.GoogleGenAIDocumentEmbedder
init_parameters:

Parameters

Init Parameters

These are the parameters you can configure in Pipeline Builder:

ParameterTypeDefaultDescription
api_keySecretSecret.from_env_var(['GOOGLE_API_KEY', 'GEMINI_API_KEY'], strict=False)Google API key, defaults to the GOOGLE_API_KEY and GEMINI_API_KEY environment variables. Not needed if using Vertex AI with Application Default Credentials. Go to https://aistudio.google.com/app/apikey for a Gemini API key. Go to https://cloud.google.com/vertex-ai/generative-ai/docs/start/api-keys for a Vertex AI API key.
apiLiteral['gemini', 'vertex']geminiWhich API to use. Either "gemini" for the Gemini Developer API or "vertex" for Vertex AI.
vertex_ai_projectOptional[str]NoneGoogle Cloud project ID for Vertex AI. Required when using Vertex AI with Application Default Credentials.
vertex_ai_locationOptional[str]NoneGoogle Cloud location for Vertex AI (e.g., "us-central1", "europe-west1"). Required when using Vertex AI with Application Default Credentials.
modelstrtext-embedding-004The name of the model to use for calculating embeddings. The default model is text-embedding-ada-002.
prefixstrA string to add at the beginning of each text.
suffixstrA string to add at the end of each text.
batch_sizeint32Number of documents to embed at once.
progress_barboolTrueIf True, shows a progress bar when running.
meta_fields_to_embedOptional[List[str]]NoneList of metadata fields to embed along with the document text.
embedding_separatorstr\nSeparator used to concatenate the metadata fields to the document text.
configOptional[Dict[str, Any]]NoneA dictionary of keyword arguments to configure embedding content configuration types.EmbedContentConfig. If not specified, it defaults to {"task_type": "SEMANTIC_SIMILARITY"}. For more information, see the Google AI Task types.

Run Method Parameters

These are the parameters you can configure for the component's run() method. This means you can pass these parameters at query time through the API, in Playground, or when running a job. For details, see Modify Pipeline Parameters at Query Time.

ParameterTypeDefaultDescription
documentsList[Document]A list of documents to embed.