Tutorial: Uploading Files with Python Methods
Use a Python SDK to quickly upload large amounts of files. This method supports uploading metadata. It uses an open source SDK package.
- Level: Intermediate
- Time to complete: 10 minutes
- Prerequisites:
- You must be an Admin to complete this tutorial.
- You need a basic knowledge of Python.
- The workspace where you want to upload the files must already be created in deepset Cloud. In this tutorial, it's called
hotel_reviews
.
- Goal: After completing this tutorial, you will have uploaded a set of hotel reviews with metadata to a deepset Cloud workspace using a Python script. You can replace this dataset with your custom one.
Prepare Your Files
This tutorial uses a set of hotel reviews with some metadata in them. You can also use your own files; just make sure they have lowercase extensions, for example myfile.txt instead of myfile.TXT.
- Download the hotel reviews dataset.
- Extract the files to a folder called hotel_reviews in your Documents folder. This can take a couple of minutes.
Result: You have 5,956 files in the \Documents\hotel_reviews folder, 2978 TXT files and 2978 JSON files. Each TXT file is accompanied by a .meta.json
file containing the text file metadata.
Install the SDK
- Open the command line and run:
pip install deepset-cloud-sdk
- Wait until the installation finishes with a success message.
Result: You have installed the deepset Cloud SDK. It comes with a command line interface that we'll use to upload the files.
Obtain the API Key
- Log in to deepset Cloud.
- Click your name in the top right corner and select Connections.
- Under API Keys, click Add new key.
- Select the expiration date for your key and click Generate key.
- Copy the key and save it to a notepad.
- Click Done.
Result: You have an API key saved in a file. You can now use it to upload your files.
Upload Files
- Write a script that will upload the files from a specified path:
- In the same folder where you saved the hotel_reviews files, create a Python script called hotel_reviews_upload.py.
- Follow the step-by-step explanation or copy the code from the example below:
from pathlib import Path
from deepset_cloud_sdk.workflows.sync_client.files import upload
## Uploads all files from a given path.
upload(
paths=[Path("<your_path_to_the_hotel_reviews_folder>")], # provide a path to the folder on your computer where you saved the hotel_reviews folder
api_key="<your_deepsetCloud_api_key>", # the API key to connect to deepset Cloud
workspace_name="<default_workspace>", # the workspace where you want to upload files
blocking=True, # waits until the files are displayed in deepset Cloud,
# this may take a couple of minutes
timeout_s=300, # the timeout for the `blocking` parameter in number of seconds
show_progress=True, # shows the progress bar
recursive=True, # uploads files from all subfolders as well
)
- Save the script and run it:
python hotel_reviews_upload.py
- Wait until the upload finishes successfully. You should see this message:
5956 files are uploaded, and half of them, 2978, are listed in deepset Cloud. (The metadata files are not shown in deepset Cloud).
Result: You have uploaded all your files, including the ones from the subfolders. Let's now see if they're showing up in deepset Cloud.
Verify the Upload
-
In the command line, list the uploaded files by running:
deepset-cloud list-files
python -m deepset_cloud_sdk.cli list-files
You should see a list of files with file ID, URL, name, size, metadata, and the date when it was created.
The number of files we uploaded makes it easier to verify if they uploaded correctly in the deepset Cloud UI. -
In deepset Cloud:
- Switch to the hotel_reviews workspace where you uploaded the files and check the workspace statistics. You should see "3K" above files.
- In the navigation, click Files and check if the files are showing on the Files page.
-
Now, let's check if the metadata was uploaded:
- In the navigation, click Files .
- Click any file that has metadata and choose View Metadata. You should see the file's metadata.
Updated 2 months ago