Filtering Logic
Use filters for search or in your evaluation sets to narrow down the search results. You can add logical and comparison operators as filters. This page explains how it works.
You specify filters by adding metadata to your files. The metadata are dictionaries with the key:value
format. To learn more about how to add them, see Upload files.
Logical and Comparison Operators
A filter contains at least one comparison of a metadata field name with a certain value. You can combine multiple comparisons using logical operators that can be nested to achieve complex filters.
Comparisons
A comparison consists of:
- A field name
- A comparison operator:
- $eq (equal to; the implicit operator for single values)
- $in (the implicit operator for multiple values)
- $gt (greater than)
- $gte (greater than or equal to)
- $lt (less than)
- $lte (less than or equal to)
- A value
A simple filter expression could look like this: {"field_name": "field_value"}
, for example: {"type": "article"}
. Note that this filter uses the implicit comparison operator $eq
.
Logical Operators
You can combine comparisons with logical operators:
- $and (the implicit logical operator)
- $or
- $not
A simple filter expression with multiple comparisons could look like this: {"field_name": "field_value", "field_name_2": "field_value_2"}
, for example: {"type": "article", "year": "2022"}
. Note that this expression uses the implicit comparison operator $eq
and the implicit logical operator $and
.
Examples
Here are some operator combinations in action:
Description | Example |
---|---|
Explicit comparison operators, implicit logical operators | metadata={ "type": {"$eq": "article"}, "date": {"$gte": "2015-01-15", "$lt": "2022-01-17"} } |
Explicit logical operators, implicit and explicit comparison operators | metadata={ "$and" : {"type":"article", "rating": {"$gte": 3}}} |
Implicit and explicit logical operators, explicit comparison operators | metadata={"type":"article", "$or": { "genre": {"$in": ["economy", "politics"]}, "publisher": {"$eq": "nytimes"} }} |
The $in comparison operator | metadata={"genre": {"$in": ["economy", "politics"]}} |
If you don't specify the logical operator, $and is used as the default operator. If you don't specify the comparison operator, $eq is used if the value is a single value and $in is used if the value is a list of values.
Here are examples of how you can combine these filters:
filters = {
"$and": {
"type": {"$eq": "article"},
"date": {"$gte": "2015-01-15", "$lt": "2021-01-17"},
"rating": {"$gte": 3},
"$or": {
"genre": {"$in": ["economy", "politics"]},
"publisher": {"$eq": "nytimes"}
}
}
}
# And an example with three layers of logical operators:
filters = {
"$and": {
"type": {"$eq": "article"},
"date": {"$gte": "2015-01-15", "$lt": "2021-01-17"},
"rating": {"$gte": 3},
"$or": {
"$not": {"genre": {"$in": ["economy", "politics"]}},
"publisher": {"$eq": "nytimes"}
}
}
}
# You can also use default operators. This expression then looks like the one below.
# To filter by dates using the API endpoints, you must use explicit operators.
# So for the example above to work with default operators, you must delete the date filter.
filters = {
"type": "article",
"rating": {"$gte": 3},
"$or": {
"genre": ["economy", "politics"],
"publisher": "nytimes"
}
}
Filtering by Dates
To filter by dates using the API endpoints, you must use explicit operators.
Logical Operators on the Same Level
Dictionary keys must be unique, which means you can't use logical operators twice on the same level. Because of that, this filter is not valid:
{
"$or": {
"$and": {
"Type": "News Paper",
"Date": {"$lt": "2019-01-14"},
},
"$and": { # repeated key in dictionary
"Type": "Blog post",
"Date": {"$gte": "2019-01-14"}
}
}
}
To get around this, logical operators can take a list of dictionaries as values. This is what the filter above looks like after using the workaround:
{
"$or": [
{
"$and": {
"Type": "News Paper",
"Date": {"$lt": "2019-01-14"}
}
},
{
"$and": {
"Type": "Blog post",
"Date": {"$gte": "2019-01-14"}
}
}
]
}
Updated 10 months ago