Skip to main content

VertexAIDocumentEmbedder

Embed text using Vertex AI Embeddings API.

Basic Information

  • Type: haystack_integrations.components.embedders.google_vertex.document_embedder.VertexAIDocumentEmbedder

Inputs

ParameterTypeDefaultDescription
documentsList[Document]A list of documents to embed.

Outputs

ParameterTypeDefaultDescription
documentsList[Document]A dictionary with the following keys: - documents: A list of documents with embeddings.

Overview

Work in Progress

Bear with us while we're working on adding pipeline examples and most common components connections.

Embed text using Vertex AI Embeddings API.

See available models in the official Google documentation.

Usage example:

from haystack import Document
from haystack_integrations.components.embedders.google_vertex import VertexAIDocumentEmbedder

doc = Document(content="I love pizza!")

document_embedder = VertexAIDocumentEmbedder(model="text-embedding-005")

result = document_embedder.run([doc])
print(result['documents'][0].embedding)
# [-0.044606007635593414, 0.02857724390923977, -0.03549133986234665,

Usage Example

components:
VertexAIDocumentEmbedder:
type: integrations.google_vertex.src.haystack_integrations.components.embedders.google_vertex.document_embedder.VertexAIDocumentEmbedder
init_parameters:

Parameters

Init Parameters

These are the parameters you can configure in Pipeline Builder:

ParameterTypeDefaultDescription
modelLiteral['text-embedding-004', 'text-embedding-005', 'textembedding-gecko-multilingual@001', 'text-multilingual-embedding-002', 'text-embedding-large-exp-03-07']Name of the model to use.
task_typeLiteral['RETRIEVAL_DOCUMENT', 'RETRIEVAL_QUERY', 'SEMANTIC_SIMILARITY', 'CLASSIFICATION', 'CLUSTERING', 'QUESTION_ANSWERING', 'FACT_VERIFICATION', 'CODE_RETRIEVAL_QUERY']RETRIEVAL_DOCUMENTThe type of task for which the embeddings are being generated. For more information see the official Google documentation.
gcp_region_nameOptional[Secret]Secret.from_env_var('GCP_DEFAULT_REGION', strict=False)The default location to use when making API calls, if not set uses us-central-1.
gcp_project_idOptional[Secret]Secret.from_env_var('GCP_PROJECT_ID', strict=False)ID of the GCP project to use. By default, it is set during Google Cloud authentication.
batch_sizeint32The number of documents to process in a single batch.
max_tokens_totalint20000The maximum number of tokens to process in total.
time_sleepint30The time to sleep between retries in seconds.
retriesint3The number of retries in case of failure.
progress_barboolTrueWhether to display a progress bar during processing.
truncate_dimOptional[int]NoneThe dimension to truncate the embeddings to, if specified.
meta_fields_to_embedOptional[List[str]]NoneA list of metadata fields to include in the embeddings.
embedding_separatorstr\nThe separator to use between different embeddings.

Run Method Parameters

These are the parameters you can configure for the component's run() method. This means you can pass these parameters at query time through the API, in Playground, or when running a job. For details, see Modify Pipeline Parameters at Query Time.

ParameterTypeDefaultDescription
documentsList[Document]A list of documents to embed.