- Level: Beginner
- Time to complete: 10 minutes
- You must be an Admin to complete this tutorial.
- The workspace where you want to upload the files must already be created in deepset Cloud. In this tutorial, we call the workspace hotel_reviews.
- Goal: After completing this tutorial, you will have uploaded a set of hotel reviews with metadata to a deepset Cloud workspace. You can replace this dataset with your custom one.
This tutorial uses a set of hotel reviews with some metadata in them. You can also use your own files; just make sure they're in the TXT or PDF format.
- Download the hotel reviews dataset.
- Extract the files to a folder called hotel_reviews in your Documents folder. This can take a couple of minutes.
Result: You have 5,956 files in the \Documents\hotel_reviews folder, 2978 TXT files and 2978 JSON files. Each TXT file is accompanied by a
.meta.json file containing the text file metadata.
- Open the command line and run:
pip install deepset-cloud-sdk
- Wait until the installation finishes with a success message.
Result: You have installed the deepset Cloud SDK. It comes with a command line interface that we'll use to upload the files.
Log in to deepset Cloud.
Click your initials in the top right corner and select Connections.
Under API Keys, click Add new key.
Select the expiration date for your key and click Generate key.
Copy the key and save it to a notepad.
Click Add new key.
Result: You have an API key saved in a file. You can now use it to upload your files.
- Open the command line and run the following command to log in to deepset Cloud:
python -m deepset_cloud_sdk.cli login
- When prompted, paste your API key.
- Type the name of the deepset Cloud workspace where you want to upload the files. This creates an .env file with the information you just provided. The SDK uses the information from this file when uploading files.
- Run this command to upload files, including all the subfolders of the hotel_reviews folder and overwrite any files with the same name that might already exist in the workspace:
deepset-cloud upload <path_to_hotel_reviews_folder> --recursive --write-mode OVERWRITE
python -m deepset_cloud_sdk.cli upload <path_to_hotel_reviews_folder> --recursive --write-mode OVERWRITE
- Wait until the upload finishes succesfully. You should see this message:
5956 files are uploaded and half of them, 2978 are listed in deepset Cloud. (This is because the metadata files are not shown in deepset Cloud).
Result: You have uploaded all your files, including the ones from the subfolders. Let's now see if they're showing up in deepset Cloud.
In the command line, list the uploaded files by running:
python -m deepset_cloud_sdk.cli list-files
You should see a list of files with file ID, URL, name, size, metadata, and the date when it was created.
With the number of files we uploaded, it's easier to verify if they uploaded correctly in the deepset Cloud UI.
You can also check it in deepset Cloud. Click the name of the workspace to switch to the workspace where you uploaded the files and choose Files in the navigation. You should see all the uploaded files on the Files page.
Now, let's check if the metadata was uploaded.
One way to do this is to open a random file and then click View Metadata on the file preview.
Metadata shows up as search filters, so let's check if that's the case. You need a pipeline to run a search, so if you don't have one in this workspace, let's quickly create one:
Go to Pipelines > Create Pipeline.
Type QuestionAnswering_en-test as the pipeline name and choose to start from scratch.
Copy the following pipeline and paste it into the code editor, replacing the current contents:
Save the pipeline.
In the top right corner of the editor, click Deploy.
In the navigation, click Pipelines . Your pipeline should be listed there as deploying. Wait until its status changes to Indexed.
When the pipeline is indexed, click Playground.
Select your pipeline, and you'll see all the metadata now available as search criteria:
Updated about 1 month ago