Upload Files
Upload your files to deepset Cloud. These files are then turned into documents and indexed when you deploy your pipeline. The files must be in TXT or PDF format.
You must be an Admin to perform this task.
Synchronous and Asynchronous Upload
There are two ways in which you can upload your files: synchronous and asynchronous.
Synchronous means the upload happens immediately, and you get direct feedback. You can use UI, the API endpoint, or an SDK method to upload files synchronously. This method is relatively slow and not recommended for large amounts of files.
The asynchronous method uses sessions. You create a session and pass a list of files to upload in this session. A session expires after 24 hours. The suggested limit on the number of files to upload in one session is 10 000 files.
Each session has an ID, and you can check its status at any time. This method is faster than the synchronous one, but it can take some time until the files are listed in deepset Cloud after they're uploaded. This means if you have a deployed pipeline, you may need to wait longer for it to run on the newly uploaded files.
Asynchronous upload is available through API endpoints only. We recommend it if you have a large number of files to upload.
Choosing the Best Method
If you... | ...choose |
---|---|
Just have a few files to upload and don't need to add metadata to them | Synchronous upload from the UI |
Have more files to upload Want to add metadata to your files Need direct feedback about your upload Don't mind using a slower method | Synchronous upload with a Python method or a REST API endpoint |
Need to upload fast Want to add metadata to your files Have a lot of files to upload Don't mind waiting a while until your files are indexed | Asynchronous upload |
Metadata
You can add metadata to your files. These metadata act as search filters at query time. To learn more, see Add Search Filters.
Upload Asynchronously
Use REST API endpoints to create a session and upload your files. When you open a session, you must upload all your files at once. You can't add files to an existing session. To upload more files, you must create a new session. You can upload up to 10 000 files in one session.
Preparing file metadata
To add metadata to your files, create one metadata file for each file you upload. The metadata file must be a JSON file with the same name as the file whose metadata it contains and the extension meta.json
.
For example, if you're uploading a file called example.txt
, the metadata file should be called example.txt.meta.json
. If you're uploading a file called example.pdf
, the metadata file should be example.pdf.meta.json
.
To upload files in a session:
- Generate an API Key. You need this to connect to deepset Cloud.
- Use the Create Upload Session API endpoint and pass your files in the request.
If you're adding metadata, make sure you pass them before the actual file.
Here's an example request:
curl --request POST \
--url https://api.cloud.deepset.ai/api/v1/workspaces/<YOUR_WORKSPACE_NAME>/upload_sessions \
--header 'accept: application/json' \
--header 'authorization: Bearer <YOUR_API_KEY>' \
--header 'content-type: application/json' \
--data '
{
"file_names": [
"file1.txt.meta.json",
"file1.txt",
"file2.pdf.meta.json",
"file2.pdf"
]
}
'
You can check the status of your session with the Get Session Status API endpoint.
Upload Synchronously
Choose the best option for you:
- If you just have a few files to upload and don't need to add metadata, upload from the UI.
- If you have a lot of files to upload or want to add metadata to them, upload with a Python method or a REST API endpoint.
Upload from the UI
- In deepset Cloud, go to Data>Files>Upload Files.
- Drag your files and folders to deepset Cloud. You can upload PDF and TXT files.
- Click Upload. Your files are now listed on the Files page.
Not recommended for a large number of files
If you have more than a few hundred files to upload, we recommend using the Python SDK or REST API. It's faster and more stable.
Upload with the Python SDK
This method is best if you have many files to upload. It also makes it possible to add metadata to your files.
You can use Notebooks in deepset Cloud to run the code. You must Generate an API Key first and upload the files to the Notebooks server.
Here's the code that you can use to upload files through SDK:
# The first five lines are all the necessary imports to make it work
import os
from haystack.utils import DeepsetCloud
from pathlib import Path
# Set the API key and API endpoint:
os.environ["DEEPSET_CLOUD_API_KEY"] = "<YOUR_API_KEY>"
os.environ["DEEPSET_CLOUD_API_ENDPOINT"] = "https://api.cloud.deepset.ai/api/v1"
file_client = DeepsetCloud.get_file_client(api_key=os.environ["DEEPSET_CLOUD_API_KEY"],
workspace="<WORKSPACE_NAME>")
# Specify the paths to your files here:
file_paths = [
Path("C:\Users\OneDrive\Documents\file1.txt"),
Path("C:\Users\OneDrive\Documents\file2.pdf")
]
# To add metadata to your files, specify them here as a dictionary.
# The number of metadata dictionaries must be the same as the number of files you're uploading.
metas = [{"key1": "value1"}, {"key2": "value1", "key2: "value2"}]
# Here you're uploading the files to deepset Cloud together with their metadata
my_files = file_client.upload_files(file_paths=file_paths, metas=metas)
Upload Files Using Notebooks in deepset Cloud
If you're using Notebooks in deepset Cloud, you must first upload all the files to the notebook server:
- In deepset Cloud, click Notebooks and select a CPU server.
- When the server is created, click Go to JupyterLab. The Notebook opens in a separate tab.
- On the top toolbar, click Upload Files:
- Choose the file to upload. Once it's uploaded, you should see it in the toolbar.
Now that your files are ready, you can use the Python code above to upload them to deepset Cloud but make sure you update the path to the Notebooks directory.
Upload with the REST API
Here's the request that you can send to upload your files. For more information, you can also see the upload file endpoint documentation. You need to Generate an API Key first.
# This is an example request to send when you're uploading a file:
curl --request POST \
--url https://api.cloud.deepset.ai/api/v1/workspaces/<YOUR_WORKSPACE_NAME>/files \
--header 'accept: application/json' \
--header 'authorization: Bearer <YOUR_API_KEY>' \
--header 'content-type: multipart/form-data' \
--form 'meta={"key1":"value1", "key2":"value2"}' \
--form [email protected]<YOUR_FILE.PDF>
# This is an example request if you're creating the file during upload:
curl --request POST \
--url 'https://api.cloud.deepset.ai/api/v1/workspaces/<YOUR_WORKSPACE_NAME>/files?file_name=myFile.txt' \
--header 'accept: application/json' \
--header 'authorization: Bearer <YOUR_API_KEY>' \
--header 'content-type: multipart/form-data' \
--form 'meta={"key1":"value1", "key2":"value2"}' \
--form 'text=This is the file text'
Updated 21 days ago