Upload Files

Upload your files to deepset Cloud. These files are then turned into documents and indexed when you deploy your pipeline. The files must be in .txt or PDF format.

📘

You must be an Admin to perform this task.

To upload a large number of files, use the API endpoint or Python SDK. These methods also make it possible to add metadata to your files. You can then use these metadata as filters for your search. To find out more, see Add Search Filters.

Upload from the UI

  1. In deepset Cloud, go to Data>Files>Upload File.
  2. Drag your files and folders to deepset Cloud. You can upload PDF and TXT files.
  3. Click Upload. Your files are now listed on the Files page.

🚧

Not recommended for a large number of files

If you have more than a few hundred files to upload, we recommend using the Python SDK or REST API. It's faster and more stable.

Upload with the Python SDK

This method is best if you have a large number of files to upload. It also makes it possible to add metadata to your files.

You can use Notebooks in deepset Cloud to run the code. You need to Generate an API Key first and upload the files to the Notebooks server.

Here's the code that you can use to upload files through SDK:

# The first five lines are all the necessary imports to make it work
import os
from haystack.utils import DeepsetCloud
from pathlib import Path

# Set the API key and API endpoint:
os.environ["DEEPSET_CLOUD_API_KEY"] = "<YOUR_API_KEY>"
os.environ["DEEPSET_CLOUD_API_ENDPOINT"] = "https://api.cloud.deepset.ai/api/v1"

file_client = DeepsetCloud.get_file_client(api_key=os.environ["DEEPSET_CLOUD_API_KEY"],  
                                           workspace="<WORKSPACE_NAME>")
# Specify the paths to your files here:
file_paths = [
  Path("C:\Users\OneDrive\Documents\file1.txt"),
  Path("C:\Users\OneDrive\Documents\file2.pdf")
]
# To add metadata to your files, specify them here as a dictionary.
# The number of metadata dictionaries must be the same as the number of files you're uploading.
metas = [{"key1": "value1"}, {"key2": "value1", "key2: "value2"}]
# Here you're uploading the files to deepset Cloud together with their metadata
my_files = file_client.upload_files(file_paths=file_paths, metas=metas)

Upload Files Using Notebooks in deepset Cloud

If you're using Notebooks in deepset Cloud, you must first upload all the files to the notebook server:

  1. In deepset Cloud, click Notebooks and select a CPU server.
  2. When the server is created, click Go to JupyterLab. The Notebook opens in a separate tab.
  3. On the top toolbar, click Upload Files:
  4. Choose the file to upload. Once it's uploaded, you should see it in the toolbar.

Now that you're files are ready, you can use the Python code above to upload them to deepset Cloud but make sure you update the path to the Notebooks directory.

Upload with the REST API

Here's the request that you can send to upload your files. For more information, you can also see the upload file endpoint documentation. You need to Generate an API Key first.

# This is an example request to send when you're uploading a file:

curl --request POST \
     --url https://api.cloud.deepset.ai/api/v1/workspaces/<YOUR_WORKSPACE_NAME>/files \
     --header 'accept: application/json' \
     --header 'authorization: Bearer <YOUR_API_KEY>' \
     --header 'content-type: multipart/form-data' \
     --form 'meta={"key1":"value1", "key2":"value2"}' \
     --form [email protected]<YOUR_FILE.PDF>
     
# This is an example request if you're creating the file during upload:
curl --request POST \
     --url 'https://api.cloud.deepset.ai/api/v1/workspaces/<YOUR_WORKSPACE_NAME>/files?file_name=myFile.txt' \
     --header 'accept: application/json' \
     --header 'authorization: Bearer <YOUR_API_KEY>' \
     --header 'content-type: multipart/form-data' \
     --form 'meta={"key1":"value1", "key2":"value2"}' \
     --form 'text=This is the file text'

Related Links