Add Metadata to Your Files

You can attach metadata to the files you upload through REST API or deepset Cloud SDK. You can then use these metadata as search filters, or to boost ranking and retrieval.

About This Task

You upload metadata along with the actual files. You can use an API endpoint or the Python SDK to do this. When uploading a file, include a corresponding metadata file with the meta.json extension. For example, if you have a file named myfile.pdf, its metadata would be stored in a separate file called myfile.pdf.meta.json. This approach ensures the metadata is associated with the correct file.

To add metadata to files that already exist in a deepset Cloud workspace, use the Update File Meta endpoint. With SDK you can only upload metadata when they're accompanied by corresponding files.

Metadata Format

Metadata is always a dictionary in this format: {"meta_key1": "value", "meta_key2": "value2"}. The value for a specific key in a key:value pair must be of the same type across all files. For example, if one file has a key called "category" and its value is of type string {"category":"news"}, then the values of "category" in all other metadata files must also be strings. An upload will fail if, say, another metadata file lists "category" with a value that's not a string, but an integer.

You can add metadata to your files when uploading with deepset Cloud SDK or REST API.

Format When Uploading with SDK

When uploading with SDK, include one metadata file for each file you're uploading. The name of the metadata file should match the original file's name, but with a meta.jsonextension. Format the metadata inside the metadata file like this: {"meta_key1": "value", "meta_key2": "value2"}.

Example metadata file

This file contains reviews for the Hotel Park Royal Palace in Vienna. The file is called austria_trend_hotel_park_royal_palace_vienna_0.txt.

Positives:
location and how clean it is

Negatives:
lack of variety in food any thing you ask for will charge you 5 euro the WiFi is not covering all my room dogs are allowed

This is the accompanying metadata file. It's called austria_trend_hotel_park_royal_palace_vienna_0.txt.meta.json.

{"Hotel_Address": "Schlossallee 8 14 Penzing 1140 Vienna Austria", "Review_Date": "2016-02-22", "Average_Score": 8.8, "Hotel_Name": "Austria Trend Hotel Park Royal Palace Vienna", "Reviewer_Score": 9.6}

Format When Uploading with REST API

If you're uploading files through the API, pass metadata for each file in a separate dictionary formatted like this: meta={"key1":"value1", "key2":"value2"}. For each file you're uploading, provide a corresponding metadata dictionary. The order matters: the first dictionary you pass will be linked to the first file, the second dictionary to the second file, and so on.

Allowed Values

In your metadata dictionaries, you can pass the following:

  • Numerical data
  • Dates
  • Keyword fields

Metadata limitations:

  • One level of nesting works best.
  • The size limit for a single value in the key:value pair is 32 766 bytes, which is roughly 32 766 characters.
  • The total size of metadata (all keys and values) is 250 000 bytes maximum, roughly 250 000 characters.

Example

Here are simple examples of metadata:

metadata = [{"type": "article", "source": "New York Times", "title": "Leaders Look into the Future", "year published": "2020"}]
{
	"type": "book",
  "topic": "business",
  "author": "Ben Horowitz",
  "book_titles": [
  	"The Hard Thing About Hard Things",
    "What You Do Is Who You Are"
    ]
 } 
    

Add Metadata with the Python SDK

The easiest way is to upload your metadata files when uploading your PDF or TXT files. For detailed instructions, see Tutorial: Uploading Files with CLI or Tutorial: Uploading Files with Python Methods.

Add Metadata with REST API

Here's the code you can use to add metadata to files when uploading them through REST API:

curl --request POST \
     --url 'https://api.cloud.deepset.ai/api/v1/workspaces/<YOUR_WORKSPACE>/files?write_mode=OVERWRITE' \
     --header 'accept: application/json' \
     --header 'authorization: Bearer <YOUR_API_KEY>' \
     --header 'content-type: multipart/form-data' \
     --form file=@<YOUR_FILE.PDF>
		// Here is where you define the metadata for your file:
     --form 'meta={"year":"2009", "type":"financial report"}'

For a step-by-step walkthrough of the code, see:

You can also add or update metadata of files that are already uploaded to deepset Cloud, using the Update File Meta endpoint. You need the ID of the file whose metadata you want to change. (You can check it with the list files endpoint.)

curl --request PUT \
     --url https://api.cloud.deepset.ai/api/v1/workspaces/<YOUR_WORKSPACE>/files/<FILE_ID>/meta \
     --header 'accept: application/json' \
     --header 'authorization: Bearer <YOUR_API_KEY>' \
     --header 'content-type: application/json'
     // Here is where you pass the metadata:
     --data '
{
     "year": "2009",
     "type": "financial report"
}
'