Upload Files with CLI

Upload files and folders, including metadata, through the command-line.

About This Task

Uploading through CLI is asynchronous. It's the recommended method if you have a lot of files to upload or want to upload files with metadata. To learn more, see also Synchronous and asynchronous upload and Working with Metadata.

Sessions

Asynchronous upload uses the mechanism of sessions to upload your files to deepset Cloud. A session stores the ingestion status of the files: the number of failed and finished files. Each session has an ID so you can check its details anytime.

A session starts when you initiate the upload. For SDK, it opens when you call the upload method or command and closes when the upload is finished. A session expires after 24 hours. You can have a maximum of 10 open sessions.

When using the SDK, you don't have to worry about the sessions as the SDK takes care of opening and closing them for you. They're just there if you want to check the status of your past and current uploads.

Folder Structure

You don't need to follow any specific folder structure. If your folder contains files with the same name, all these files are uploaded by default. You can set the write mode to overwrite the files, keep them all, or fail the upload.

File Extensions

Make sure your files have lowercase extensions, for example, my_file.pdf, instead of my_file.PDF. The SDK doesn't upload files with uppercase extensions.


Prerequisites

  1. Install the SDK
  2. Generate an API Key to connect to a deepset Cloud workspace.

Upload Files

  1. Pass your API key and the name of the deepset Cloud workspace where you want to upload the files. (You can also skip this step and just pass your API key and workspace in the update command.)
    # This command prompts you to pass the deepset Cloud API key and workspace name
    deepset-cloud login
    
    # This command prompts you to pass the deepset Cloud API key and workspace name
    python3 -m deepset_cloud_sdk.cli login
    
  2. Run the following command to upload your files, specifying any options you want:
    deepset-cloud upload <folder_path>
    
    python3 -m deepset_cloud_sdk.cli upload <folder_path>
    

📘

File Types

By default, the SDK is set to upload TXT and PDF files. To upload other file types, use the --use-type option and pass the file extensions to it, for example:
deepset-cloud upload --use-type .xml --use-type .pdf --use-type .md

While the upload is progressing, you can first see the upload status, which means the files were uploaded to the S3 bucket, and then the ingestion status, which means the files are being transferred to deepset Cloud. Both processes must be successful for the files to be uploaded to deepset Cloud.

Examples

  • Upload all TXT and PDF files from the folder, including subfolders, and overwrite any duplicate files:
    deepset-cloud upload ./hotel_reviews --recursive --write-mode OVERWRITE
    
    python3 -m deepset_cloud_sdk.cli upload C:\Users\User1\Downloads\hotel_reviews --recursive --write-mode OVERWRITE
    
  • Upload Markdown files passing the workspace name and API key:
    deepset-cloud upload ./hotel_reviews --api-key api_123 --workspace-name my_workspace --use-type .md
    
    python3 -m deepset_cloud_sdk.cli upload C:\Users\User1\Downloads\hotel_reviews --api-key api_123 --workspace-name my_workspace
    
  • Upload JSON, XLSX, and XML files:
    deepset-cloud upload ./hotel_reviews --use-type .json --use-type .xlsx --use-type .xml
    
    python3 -m deepset_cloud_sdk.cli upload C:\Users\User1\Downloads\hotel_reviews --use-type .json --use-type .xlsx --use-type .xml
    

Related Links