Add Search Filters Through Metadata
Add filters to narrow down the scope of your search. You do this by attaching metadata to the files you upload to deepset Cloud. These metadata then act as filters at query time.
Metadata as Filters
For example, if you have a set of financial reports covering a decade, you may want to search only through the reports for a given year, so you'll want to select a year in your search. To do this, add metadata to your files, for example: {"year":"2009"}
.
Here's what it will look like on the Search page:
Metadata Format
If you're uploading files through API, you pass metadata in a dictionary in this format: meta={"key1":"value1", "key2":"value2"}
. Pass one dictionary for one file you're uploading, so if you're uploading ten files, you should pass ten metadata dictionaries. The first dictionary attaches metadata to the first file you're passing, the second one to the second file, and so on.
The value assigned to a particular key in a key:value pair must be of the same type in every file. For example, if the first file has a key called "category" and its value is of type string meta={"category":"news"}
, then the values of the key "category" in all other metadata files must also be of the string type. If a second metadata file contains the key called "category" but its value is an integer, the upload will fail.
Add one metadata file for each of your files. The metadata file should have the same name as the file whose metadata it contains but its extension should be meta.json
. The format of the metadata in the metadata file should be: {"meta_key1": "value", "meta_key2": "value2"}
.
Example metadata file
This is a file containing reviews for the Hotel Park Royal Palace in Vienna. The file is called austria_trend_hotel_park_royal_palace_vienna_0.txt.
Positives:
location and how clean it is
Negatives:
lack of variety in food any thing you ask for will charge you 5 euro the WiFi is not covering all my room dogs are allowed
This is the accompanying metadata file. It's called austria_trend_hotel_park_royal_palace_vienna_0.txt.meta.json.
{"Hotel_Address": "Schlossallee 8 14 Penzing 1140 Vienna Austria", "Review_Date": "2016-02-22", "Average_Score": 8.8, "Hotel_Name": "Austria Trend Hotel Park Royal Palace Vienna", "Reviewer_Score": 9.6}
In your metadata dictionaries, you can pass the following:
- Numerical data
- Dates
- Keyword fields
Metadata limitations:
- One level of nesting works best.
- The size limit for a single value in the key:value pair is 32 766 bytes, which is roughly 32 766 characters.
- The total size of metadata (all keys and values) is 250 000 bytes maximum, roughly 250 000 characters.
How to Add Metadata
You upload metadata along with the actual files. You can use an API endpoint or the Python SDK to do this. When uploading a PDF or a TXT file, you must include a corresponding metadata file with the meta.json
extension. For example, if you have a file named myfile.pdf, its metadata would be stored in a separate file called myfile.pdf.meta.json. This approach ensures the metadata is associated with the correct file.
To add metadata to files that already exist in a deepset Cloud workspace, use the Update File Meta endpoint.
To learn more about using metadata as filters, see Filtering Logic.
Example
Here are simple examples of metadata:
metadata = [{"type": "article", "source": "New York Times", "title": "Leaders Look into the Future", "year published": "2020"}]
{
"type": "book",
"topic": "business",
"author": "Ben Horowitz",
"book_titles": [
"The Hard Thing About Hard Things",
"What You Do Is Who You Are"
]
}
Add Filters with the Python SDK
The easiest way is to upload your metadata files when uploading your PDF or TXT files. For detailed instructions, see Tutorial: Uploading Files with CLI or Tutorial: Uploading Files with Python Methods.
Add Filters with REST API
Here's the code you can use to add metadata to files when uploading them through REST API:
curl --request POST \
--url 'https://api.cloud.deepset.ai/api/v1/workspaces/<YOUR_WORKSPACE>/files?write_mode=OVERWRITE' \
--header 'accept: application/json' \
--header 'authorization: Bearer <YOUR_API_KEY>' \
--header 'content-type: multipart/form-data' \
--form file=@<YOUR_FILE.PDF>
// Here is where you define the metadata for your file:
--form 'meta={"year":"2009", "type":"financial report"}'
For a step-by-step walkthrough of the code, see:
You can also add or update metadata of files that are already uploaded to deepset Cloud, using the Update File Meta endpoint. You need the ID of the file whose metadata you want to change. (You can check it with the list files endpoint.)
curl --request PUT \
--url https://api.cloud.deepset.ai/api/v1/workspaces/<YOUR_WORKSPACE>/files/<FILE_ID>/meta \
--header 'accept: application/json' \
--header 'authorization: Bearer <YOUR_API_KEY>' \
--header 'content-type: application/json'
// Here is where you pass the metadata:
--data '
{
"year": "2009",
"type": "financial report"
}
'
Updated 10 months ago