Basic Information

Type: deepset_cloud_custom_nodes.auth.cloud_sql_auth.CloudSQLAuthProxy
Components it can connect with:
- CloudSQLAuthProxy doesn't connect to any other components. Add it directly to your pipeline to connect deepset AI Platform with your GCP-hosted Cloud SQL for PostgreSQL database.

Inputs

This component doesn't require any inputs as it doesn't connect to any other component in the pipeline. Its sole task is to connect to a remote database

Outputs

Name	Type	Description
`connection_name`	String	The instance connection name for connecting to the Cloud SQL database instance. This is the name you specified in the `instance_connection_name` init parameter.
`status`	String	The status of the connection. If everything is working correctly, the status is `OK`.

Overview

deepset AI Platform can connect to your Cloud SQL for PostgreSQL with the pgvector extension.

Cloud SQL for PostgreSQL is a fully managed relational database service from Google Cloud. It streamlines database management by handling backups, updates, and scaling so you can focus on your applications. For details, see Google Cloud documentation.

The pgvector extension enables vector similarity search for Postgres. For details, see the pgvector GitHub repository. In deepset AI Platform, it's represented as the PgvectorDocumentStore where your pipelines can access your data.

To enable deepset pipelines to query data in your Cloud SQL for PostgreSQL database, you connect deepset AI Platform to your database using the CloudSQLAuthProxy component. When included in your pipeline, CloudSQLAuthProxy creates a secure connection to your Google Cloud SQL database. You only need to add it to the pipeline—no additional connections to other components are required, as its sole role is establishing the database connection.

You then use a Pgvector retriever that accesses the PgvectorDocumentStore and fetches the relevant documents.

CloudSQLAuthProxy downloads the Cloud SQL for PostgreSQL proxy binary and saves it to a local file. It then uses it to open a server port to connect to the database. The default binary is for Linux. You can specify both the binary's URL and the filename for storage. Afterward, it uses the credentials you provide to initialize the connection. By default, it reads the credentials from the CLOUD_SQL_CREDENTIALS environment variable.

Limitations

The following features don't work in this setup, as its an external database and deepset AI Platform doesn't have access to this information:

Pipeline indexing status. The pipeline will show as partially indexed.
The number of indexed documents. They'll show as skipped.
Automatic index creation or deletion when deploying or undeploying pipelines. You'll need to manage your index in Cloud SQL. For guidance, refer to Cloud SQL documentation.

Usage Example

This is an example of an index with CloudSQLAuthProxy at the beginning of the pipeline. As you can see, it's not listed in the connections section as it doesn't require to be connected to other components.

components:
  cloud_sql:
    type: deepset_cloud_custom_nodes.auth.cloud_sql_auth.CloudSQLAuthProxy
    init_parameters:
      url: ""https://storage.googleapis.com/cloud-sql-connectors/cloud-sql-proxy/v2.13.0/cloud-sql-proxy.darwin.arm64"
      instance_connection_name: ""careful-time-421813:us-central1:myinstance"
      # we store json_credentials in the CLOUD_SQL_DATABASE env variable from which they're read by default, so we don't specify them here
      
  FileTypeRouter:
    type: haystack.components.routers.file_type_router.FileTypeRouter
    init_parameters:
      mime_types:
        - text/plain
        - text/markdown
  TextFileToDocument:
    type: haystack.components.converters.txt.TextFileToDocument
    init_parameters:
      encoding: utf-8
  MarkdownToDocument:
    type: haystack.components.converters.markdown.MarkdownToDocument
    init_parameters:
      table_to_single_line: false
      progress_bar: true
  DocumentJoiner:
    type: haystack.components.joiners.document_joiner.DocumentJoiner
    init_parameters:
      join_mode: concatenate
      weights: null
      top_k: null
      sort_by_score: true
  DocumentSplitter:
    type: haystack.components.preprocessors.document_splitter.DocumentSplitter
    init_parameters:
      split_by: word
      split_length: 200
      split_overlap: 0
      split_threshold: 0
      splitting_function: null
  DocumentWriter:
    type: haystack.components.writers.document_writer.DocumentWriter
    init_parameters:
      document_store:
        type: haystack_integrations.document_stores.pgvector.document_store.PgvectorDocumentStore
        init_parameters:
          table_name: deepset_test
          embedding_dimension: 768
          vector_function: cosine_similarity
          recreate_table: True,
          search_strategy: hnsw
      policy: NONE
connections:
  - sender: FileTypeRouter.text/plain
    receiver: TextFileToDocument.sources
  - sender: FileTypeRouter.text/markdown
    receiver: MarkdownToDocument.sources
  - sender: TextFileToDocument.documents
    receiver: DocumentJoiner.documents
  - sender: MarkdownToDocument.documents
    receiver: DocumentJoiner.documents
  - sender: DocumentJoiner.documents
    receiver: DocumentSplitter.documents
  - sender: DocumentSplitter.documents
    receiver: DocumentWriter.documents
max_loops_allowed: 100
metadata: {}
inputs:
  files:
    - FileTypeRouter.sources

Init Parameters

Parameter	Type	Possible values	Description
`instance_connection_name`	String		The instance connection name for connecting to the Cloud SQL database instance. Required
`json_credentials`	Secret	Default: `Secret.from_env_var("CLOUD_SQL_CREDENTIALS")`	The JSON credentials for authenticating with Cloud SQL. By default, they're read from the `CLOUD_SQL_CREDENTIALS` environment variable. Required.
`url`	String	Default: `None`	CloudSQLAuthProxy downloads the Cloud SQL Proxy binary. This parameter specifies the URL from which the binary should be downloaded. The binary is system-specific. If not specified, a default URL for the Linux AMD64 version is used. Optional.
`port`	Integer	Default: `"5432"`	The port to check for an existing Cloud SQL Proxy instance. If this port is in use, it indicates that a proxy is already running. For example, it could be one of the following:\| - MySQL 3306 - Postgres 5432 - SQL Server 1433 Required.

Run Method Parameters

This component has no runtime parameters.