CloudSQLAuthProxy
Use this component to connect to a Cloud SQL for PostgreSQL database and query your data using deepset Cloud pipelines.
Basic Information
- Pipeline type: Used in indexing and pipelines for connecting to a remote database.
- Type:
deepset_cloud_custom_nodes.auth.cloud_sql_auth.CloudSQLAuthProxy
- Components it can connect with:
- CloudSQLAuthProxy doesn't connect to any other components. Add it directly to your pipeline to connect deepset Cloud with your GCP-hosted Cloud SQL for PostgreSQL database.
Inputs
This component doesn't require any inputs as it doesn't connect to any other component in the pipeline. Its sole task is to connect to a remote database
Outputs
Name | Type | Description |
---|---|---|
connection_name | String | The instance connection name for connecting to the Cloud SQL database instance. This is the name you specified in the instance_connection_name init parameter. |
status | String | The status of the connection. If everything is working correctly, the status is OK . |
Overview
deepset Cloud can connect to your Cloud SQL for PostgreSQL with the pgvector
extension.
Cloud SQL for PostgreSQL is a fully managed relational database service from Google Cloud. It streamlines database management by handling backups, updates, and scaling so you can focus on your applications. For details, see Google Cloud documentation.
The pgvector
extension enables vector similarity search for Postgres. For details, see the pgvector GitHub repository.
To enable deepset Cloud pipelines to query data in your Cloud SQL for PostgreSQL database, you can connect deepset Cloud to your database using the CloudSQLAuthProxy component. When included in your pipeline, CloudSQLAuthProxy
creates a secure connection to your Google Cloud SQL database. You only need to add it to the pipeline—no additional connections to other components are required, as its sole role is establishing the database connection.
CloudSQLAuthProxy
downloads the Cloud SQL for PostgreSQL proxy binary and saves it to a local file. It then uses it to open a server port to connect to the database. The default binary is for Linux. You can specify both the binary's URL and the filename for storage. Afterward, it uses the credentials you provide to initialize the connection. By default, it reads the credentials from the CLOUD_SQL_CREDENTIALS
environment variable.
Limitations
The following features don't work in this setup, as its an external database and deepset Cloud doesn't have access to this information:
- Pipeline indexing status. The pipeline will show as
partially indexed
. - The number of indexed documents. They'll show as
skipped
. - Automatic index creation or deletion when deploying or undeploying pipelines. You'll need to manage your index in Cloud SQL. For guidance, refer to Cloud SQL documentation.
Usage Example
This is an example of an indexing pipeline with CloudSQLAuthProxy at the beginning of the pipeline. As you can see, it's not listed in the connections
section as it doesn't require to be connected to other components.
components:
cloud_sql:
type: deepset_cloud_custom_nodes.auth.cloud_sql_auth.CloudSQLAuthProxy
init_parameters:
url: ""https://storage.googleapis.com/cloud-sql-connectors/cloud-sql-proxy/v2.13.0/cloud-sql-proxy.darwin.arm64"
instance_connection_name: ""careful-time-421813:us-central1:myinstance"
# we store json_credentials in the CLOUD_SQL_DATABASE env variable from which they're read by default, so we don't specify them here
file_classifier:
type: haystack.components.routers.file_type_router.FileTypeRouter
init_parameters:
mime_types:
- text/plain
- application/pdf
- text/markdown
- text/html
- application/vnd.openxmlformats-officedocument.wordprocessingml.document
- application/vnd.openxmlformats-officedocument.presentationml.presentation
- application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
....
connections:
-sender: file_classifier.text/plain
receiver: text_converter.sources
...
Init Parameters
Parameter | Type | Possible values | Description |
---|---|---|---|
instance_connection_name | String | The instance connection name for connecting to the Cloud SQL database instance. Required | |
json_credentials | Secret | Default: Secret.from_env_var("CLOUD_SQL_CREDENTIALS") | The JSON credentials for authenticating with Cloud SQL. By default, they're read from the CLOUD_SQL_CREDENTIALS environment variable.Required. |
url | String | Default: None | CloudSQLAuthProxy downloads the Cloud SQL Proxy binary. This parameter specifies the URL from which the binary should be downloaded. The binary is system-specific. If not specified, a default URL for the Linux AMD64 version is used. Optional. |
output_file | String | Default: "cloud-sql-proxy" | The name of the local file where the downloaded proxy binary is saved. Required. |
Updated 7 days ago