Filter Syntax

Filter with your file's metadata to refine searches and retrieval, ensuring the retruned results match the conditions in the filters.

How Filters Work

Filters operate on metadata attached to your files and documents. The metadata are dictionaries with the key:value format. To learn how to add them, see Add Metadata to Your Files.

Filters are used to match documents based on specific criteria. A filter contains the name of the metadata field, comparison operator, and value. You can apply them alone or combined using logical operators.

Filter Structure

Filters are defined as nested dictionaries that use explicit operators. They can be of two types: comparison or logic.

Comparison

Comparison filters contain at least one comparison of a metadata field name with a certain value. You can combine multiple comparisons using logical operators that can be nested to achieve complex filters.

Each comparison filter must contain the following keys:

  • field: The name of the metadata field. It must be prefixed with meta.. For example, this is how you would filter for a metadata field called "type": field: meta.type.
  • operator: A comparison operator, must be one of the following operators:
    • ==: Equal, checks if the document field matches the specified value.
    • !=: Not equal, checks if the document field doesn't match the specified value.
    • >: Greater than, compares numerical or date values.
    • >=: Greater than or equal, compares numerical or date values.
    • <: Less than, compares numerical or date values.
    • <=: Less than or equal, compares numerical or date values.
    • in: In, checks if a field contains a specific value.
    • not in: Not in, checks if a field doesn't contain a specific value.
  • value: The metadata field value. When used with in and not in operators, the value can be a list.

Logic

Logic filters combine multiple conditions to form complex queries. They must contain the following keys:

  • operator: A logical operator, must be one of the following (note that logical operators are capitalized):
    • NOT: The specified condition must not be met.
    • AND: All conditions must be met.
    • OR: At least one of the conditions must be met.
  • conditions: A list of dictionaries defines the conditions that must be met. These dictionaries can be either of type comparison or logic.

Examples

Here's a simple comparison filter that checks if the value of a document's metadata field 'type" is "article":

filters = {"field": "meta.type", "operator": "==", "value": "article"}

router:
      type: haystack.components.routers.metadata_router.MetadataRouter
      rules:
        en: # Router's rules use filters:
            field: meta.type
            operator: ==
            value: article

Here's a more complex filter combining both comparison and logic filters. For a document to be selected, all of the following conditions must be true, as they're combined using the AND operator:

  1. The document's type must be "article".
  2. The document's date must be on or after January 1, 2015.
  3. The document's date must be before January 1, 2021.
  4. The document's rating must be 3 or higher.
  5. The document must either belong to the genres "economy" or "politics" or it must be published by "The New York Times".
filters = {
            "operator": "AND",
            "conditions": [
                {"field": "meta.type", "operator": "==", "value": "article"},
                {"field": "meta.date", "operator": ">=", "value": "2015-01-01"},
                {"field": "meta.date", "operator": "<", "value": "2021-01-01"},
                {"field": "meta.rating", "operator": ">=", "value": 3},
                {
                    "operator": "OR",
                    "conditions": [
                        {"field": "meta.genre", "operator": "in", "value": ["economy", "politics"]},
                        {"field": "meta.publisher", "operator": "==", "value": "nytimes"},
                    ],
                },
            ],
        }
retriever: 
      type: haystack.components.retrievers.filter_retriever.FilterRetriever
      init_parameters:
        document_store:
          init_parameters:
            use_ssl: True
            verify_certs: False
            http_auth:
              - "${OPENSEARCH_USER}"
              - "${OPENSEARCH_PASSWORD}"
          type: haystack_integrations.document_stores.opensearch.document_store.OpenSearchDocumentStore
        filters:
          operator: "AND"
          conditions:
          - field: meta.type
            operator: ==
            value: "article"
          - field: meta.date
            operator: ==
            value: 1420066800
          - field: meta.date
            operator: <
            value: 1609455600
          - field: meta.rating
            operator: <
            value: 3
          - operator: "OR"
            conditions:
            - field: meta.genre
              operator: "IN"
              value: ["economy", "politics"]
            - field: meta.publisher
              operator: ==
              value: "nytimes"

Components Using Filters

For details and usage examples, see the component's documentation page.

Legacy Filters

The syntax of filters in deepset Cloud v1.0 didn't require explicit keys for comparison filters and used operators preceded by $ (the dollar sign). The new syntax is more explicit and easier to understand.

We will continue to support the old syntax for now, but we encourage you to gradually migrate to the new filtering syntax.

Here's the mapping of operators in both versions:

v.1.0v2.0
$eq==
$ne!=
$gt>
$gte> =
$lt<
$lte<=
$inin
$ninnot in

Comparison filters in v1.0 didn't explicitly indicate the field name, value, or operator. This is how the same filter expression looks like in both versions:

legacy_filter = {  
        "$and": {  
            "type": {"$eq": "article"},  
            "date": {"$gte": "2015-01-01", "$lt": "2021-01-01"},  
            "rating": {"$gte": 3},  
            "$or": {"genre": {"$in": ["economy", "politics"]}, "publisher": {"$eq": "nytimes"}},  
        }  
    }
current_filter = {
        "operator": "AND",
        "conditions": [
            {"field": "meta.type", "operator": "==", "value": "article"},
            {"field": "meta.date", "operator": ">=", "value": "2015-01-01"},
            {"field": "meta.date", "operator": "<", "value": "2021-01-01"},
            {"field": "meta.rating", "operator": ">=", "value": 3},
            {
                "operator": "OR",
                "conditions": [
                    {"field": "meta.genre", "operator": "in", "value": ["economy", "politics"]},
                    {"field": "meta.publisher", "operator": "==", "value": "nytimes"},
                ],
            },
        ],
    }