Connect Your S3 Bucket

deepset Cloud comes with seamless integration with Amazon S3, a simple storage service by Amazon Web Services. Store your data in an AWS S3 bucket and connect it to deepset Cloud to use the files in S3 in your deepset Cloud pipelines.

About This Task

You use the API to connect deepset Cloud to your Amazon Web Services S3 bucket. There are four steps you must complete:

  1. Give deepset Cloud an AWS role that can access your S3 bucket.
  2. Send an API call to create the S3 credentials for a deepset Cloud workspace.
  3. Send an API call to create a deepset Cloud workspace you want to connect to your S3 bucket. The pipelines in this workspace will use the files you store in S3.

When you connect your workspace to an S3 bucket, deepset Cloud smoothly transfers any files you upload in this workspace to the associated S3 bucket. Although you can still view these files within deepset Cloud, they're actually stored in the S3 bucket.

📘

Security First

You can't connect an S3 bucket to an existing workspace. This is because we want your files to be secure. When you create a dedicated workspace connected to your private cloud, you can be certain all your files are stored in your private cloud, not in deepset Cloud. This wouldn't be the case if you connected your private cloud to an existing workspace.

Prerequisites

  • You can either use an existing AWS S3 bucket, or let CloudFormation create a new one.
  • Have the name for your bucket at hand. You'll need it for CloudFormation to create a role for deepset Cloud, and in the request to create S3 credentials in deepset Cloud.
  • Have your deepset Cloud organization ID at hand. Use the Read Users Me [private] API endpoint to obtain it. You'll need it when creating a stack in CloudFormation to grant deepset Cloud access to your bucket.

Connect Your S3 Bucket to deepset Cloud

  1. Add a role for deepset Cloud to authorize it to work with your files:
    1. In AWS, open CloudFormation and click Create stack.
    2. Select Template is ready.
    3. As template source, choose Amazon S3 URL, paste this URL: https://deepsetcloud-cloudformation-templates.s3.eu-central-1.amazonaws.com/BringYourOwnCloud/AWS/FileStore/S3Bucket/deepsetCloud-BringYourOwnCloud-AWS-FileStore-S3Bucket_cloudformation.yaml and click Next.
    4. Give your stack a name.
    5. In the Parameters section:
      1. Choose whether you want to use an existing bucket or not.
      2. Set the bucket name.
      3. Set your deepset Cloud organization ID as the ExternalId.
      4. Set the name for the IAM Role to create.
    6. For all other options, leave the default settings. Continue to the last step.
    7. On the last step, acknowledge the AWS statement in the Capabilities section and click Submit. Wait until the status of your stack changes to CREATE_COMPLETE.
    8. Open the stack and go to the Outputs tab.
    9. Copy the value of the RoleARN key as you'll need it when sending a request to create the credentials.
      The AWS CloudFormation app with the stack for deepset Cloud role open on the Outputs tab and the value of the Role ARN highlighted.
  2. Add the S3 credentials using the Add S3 Credentials [private] endpoint. Here's a sample request you can use as a starting point:
    curl --request POST \
         --url https://api.cloud.deepset.ai/api/v1/infrastructure/file_stores \
         --header 'accept: application/json' \
         --header 'content-type: application/json' \
         --data '
    {
      "assume_role": {
        "role_arn": "<roleARN>",
        "role_session_name": "<session_name>" 
      },
      "bucket_name": "<name_of_your_bucket>"
    }
    '
    

role_session_name is any name you want to give to this session.
You can also follow this step-by-step walkthrough the code with explanations:

  1. Wait for the response. The response contains the ID of the credentials for S3. You'll need it to create the workspace connected to S3.
  2. Create the workspace that will use the files stored in S3. Use the Create Workspace [private] endpoint and specify the file_store_credentials. You can ignore the document_store_credentials (it's used for OpenSearch).
    Here's the code you can use to start with:
    curl --request POST \
         --url https://api.cloud.deepset.ai/api/v1/workspaces \
         --header 'accept: application/json' \
         --header 'content-type: application/json' \
         --data '
    {
      "file_store_credentials": {
        "file_store_credentials_id": "<credentials_id_from_the_response>"
      },
      "name": "<workspace_name>"
    }
    '
    

You can also follow this step-by-step recipe explaining the code:

What To Do Next

You connected your newly created workspace to your S3 bucket. All the files you upload to this workspace will be stored in your S3 bucket, and all the pipelines you create in this workspace will use the files from S3.

Updating Credentials

If your credentials to any of the S3 buckets connected to deepset Cloud change, you need to:

  1. Delete the workspace that uses the credentials. (You can also do this from deepset Cloud interface.)
  2. Delete the credentials using the Delete S3 Credentials endpoint.
  3. Add new credentials .
  4. Create a new workspace. You can then copy your pipelines over to this workspace.

Deleting Credentials

To delete the credentials used by a cluster, you must first delete the workspace that uses them and then delete the credentials. If you delete the credentials without deleting the workspace, the workspace becomes unusable anyway.

Backup

deepset Cloud doesn't create any backups for the data of your pipelines to prevent the data from spreading into deepset's storage. We highly recommend you configure regular or continuous backups for your S3 buckets, and create runbooks, and regularly practice index recovery.