VertexAIDocumentEmbedder
Embed text using Vertex AI Embeddings API.
Basic Information
- Type:
haystack_integrations.components.embedders.google_vertex.document_embedder.VertexAIDocumentEmbedder
Inputs
| Parameter | Type | Default | Description |
|---|---|---|---|
| documents | List[Document] | A list of documents to embed. |
Outputs
| Parameter | Type | Default | Description |
|---|---|---|---|
| documents | List[Document] | A dictionary with the following keys: - documents: A list of documents with embeddings. |
Overview
Work in Progress
Bear with us while we're working on adding pipeline examples and most common components connections.
Embed text using Vertex AI Embeddings API.
See available models in the official Google documentation.
Usage example:
from haystack import Document
from haystack_integrations.components.embedders.google_vertex import VertexAIDocumentEmbedder
doc = Document(content="I love pizza!")
document_embedder = VertexAIDocumentEmbedder(model="text-embedding-005")
result = document_embedder.run([doc])
print(result['documents'][0].embedding)
# [-0.044606007635593414, 0.02857724390923977, -0.03549133986234665,
Usage Example
components:
VertexAIDocumentEmbedder:
type: integrations.google_vertex.src.haystack_integrations.components.embedders.google_vertex.document_embedder.VertexAIDocumentEmbedder
init_parameters:
Parameters
Init Parameters
These are the parameters you can configure in Pipeline Builder:
| Parameter | Type | Default | Description |
|---|---|---|---|
| model | Literal['text-embedding-004', 'text-embedding-005', 'textembedding-gecko-multilingual@001', 'text-multilingual-embedding-002', 'text-embedding-large-exp-03-07'] | Name of the model to use. | |
| task_type | Literal['RETRIEVAL_DOCUMENT', 'RETRIEVAL_QUERY', 'SEMANTIC_SIMILARITY', 'CLASSIFICATION', 'CLUSTERING', 'QUESTION_ANSWERING', 'FACT_VERIFICATION', 'CODE_RETRIEVAL_QUERY'] | RETRIEVAL_DOCUMENT | The type of task for which the embeddings are being generated. For more information see the official Google documentation. |
| gcp_region_name | Optional[Secret] | Secret.from_env_var('GCP_DEFAULT_REGION', strict=False) | The default location to use when making API calls, if not set uses us-central-1. |
| gcp_project_id | Optional[Secret] | Secret.from_env_var('GCP_PROJECT_ID', strict=False) | ID of the GCP project to use. By default, it is set during Google Cloud authentication. |
| batch_size | int | 32 | The number of documents to process in a single batch. |
| max_tokens_total | int | 20000 | The maximum number of tokens to process in total. |
| time_sleep | int | 30 | The time to sleep between retries in seconds. |
| retries | int | 3 | The number of retries in case of failure. |
| progress_bar | bool | True | Whether to display a progress bar during processing. |
| truncate_dim | Optional[int] | None | The dimension to truncate the embeddings to, if specified. |
| meta_fields_to_embed | Optional[List[str]] | None | A list of metadata fields to include in the embeddings. |
| embedding_separator | str | \n | The separator to use between different embeddings. |
Run Method Parameters
These are the parameters you can configure for the component's run() method. This means you can pass these parameters at query time through the API, in Playground, or when running a job. For details, see Modify Pipeline Parameters at Query Time.
| Parameter | Type | Default | Description |
|---|---|---|---|
| documents | List[Document] | A list of documents to embed. |
Was this page helpful?