Filtering Logic

Use filters for search or in your evaluation sets to narrow down the search results. You can add logical and comparison operators as filters. This page explains how it works.

You specify filters by adding metadata to your files. The metadata are dictionaries with the key:value format. To learn more about how to add them, see Upload files.

Logical and Comparison Operators

A filter contains at least one comparison of a metadata field name with a certain value. You can combine multiple comparisons using logical operators that can be nested to achieve complex filters.

Comparisons

A comparison consists of:

  • A field name
  • A comparison operator:
    • $eq (equal to; the implicit operator for single values)
    • $in (the implicit operator for multiple values)
    • $gt (greater than)
    • $gte (greater than or equal to)
    • $lt (less than)
    • $lte (less than or equal to)
  • A value

A simple filter expression could look like this: {"field_name": "field_value"}, for example: {"type": "article"}. Note that this filter uses the implicit comparison operator $eq.

Logical Operators

You can combine comparisons with logical operators:

  • $and (the implicit logical operator)
  • $or
  • $not

A simple filter expression with multiple comparisons could look like this: {"field_name": "field_value", "field_name_2": "field_value_2"}, for example: {"type": "article", "year": "2022"}. Note that this expression uses the implicit comparison operator $eq and the implicit logical operator $and.

Examples

Here are some operator combinations in action:

DescriptionExample
Explicit comparison operators, implicit logical operatorsmetadata={ "type": {"$eq": "article"}, "date": {"$gte": "2015-01-15", "$lt": "2022-01-17"} }
Explicit logical operators, implicit and explicit comparison operatorsmetadata={ "$and" : {"type":"article", "rating": {"$gte": 3}}}
Implicit and explicit logical operators, explicit comparison operatorsmetadata={"type":"article", "$or": { "genre": {"$in": ["economy", "politics"]}, "publisher": {"$eq": "nytimes"} }}
The $in comparison operatormetadata={"genre": {"$in": ["economy", "politics"]}}

If you don't specify the logical operator, $and is used as the default operator. If you don't specify the comparison operator, $eq is used if the value is a single value and $in is used if the value is a list of values.

Here are examples of how you can combine these filters:

filters = {
    "$and": {
        "type": {"$eq": "article"},
        "date": {"$gte": "2015-01-15", "$lt": "2021-01-17"},
        "rating": {"$gte": 3},
        "$or": {
            "genre": {"$in": ["economy", "politics"]},
            "publisher": {"$eq": "nytimes"}
        }
    }
}

# And an example with three layers of logical operators:
filters = {
    "$and": {
        "type": {"$eq": "article"},
        "date": {"$gte": "2015-01-15", "$lt": "2021-01-17"},
        "rating": {"$gte": 3},
        "$or": {
            "$not": {"genre": {"$in": ["economy", "politics"]}},
            "publisher": {"$eq": "nytimes"}
        }
    }
}
# You can also use default operators. This expression then looks like the one below.
# To filter by dates using the API endpoints, you must use explicit operators.
# So for the example above to work with default operators, you must delete the date filter. 
filters = {
    "type": "article",
    "rating": {"$gte": 3},
    "$or": {
        "genre": ["economy", "politics"],
        "publisher": "nytimes"
    }
}

👍

Filtering by Dates

To filter by dates using the API endpoints, you must use explicit operators.

Logical Operators on the Same Level

Dictionary keys must be unique, which means you can't use logical operators twice on the same level. Because of that, this filter is not valid:

{
    "$or": {
        "$and": {
            "Type": "News Paper",
            "Date": {"$lt": "2019-01-14"},
        },
        "$and": {      # repeated key in dictionary
            "Type": "Blog post",
            "Date": {"$gte": "2019-01-14"}
        }
    }
}

To get around this, logical operators can take a list of dictionaries as values. This is what the filter above looks like after using the workaround:

{
    "$or": [
        {
            "$and": {
                "Type": "News Paper",
                "Date": {"$lt": "2019-01-14"}
            }
        },
        {
            "$and": {
                "Type": "Blog post",
                "Date": {"$gte": "2019-01-14"}
            }
        }
    ]
}