Filter Syntax
Filter with your file's metadata to refine searches and retrieval, ensuring the retruned results match the conditions in the filters.
How Filters Work
Filters operate on metadata attached to your files and documents. The metadata are dictionaries with the key:value
format. To learn how to add them, see Add Metadata to Your Files.
Filters are used to match documents based on specific criteria. A filter contains the name of the metadata field, comparison operator, and value. You can apply them alone or combined using logical operators.
Filter Structure
Filters are defined as nested dictionaries that use explicit operators. They can be of two types: comparison or logic.
Comparison
Comparison filters contain at least one comparison of a metadata field name with a certain value. You can combine multiple comparisons using logical operators that can be nested to achieve complex filters.
Each comparison filter must contain the following keys:
field
: The name of the metadata field. It must be prefixed withmeta.
. For example, this is how you would filter for a metadata field called "type":field: meta.type
.operator
: A comparison operator, must be one of the following operators:==
: Equal, checks if the document field matches the specified value.!=
: Not equal, checks if the document field doesn't match the specified value.>
: Greater than, compares numerical or date values.>=
: Greater than or equal, compares numerical or date values.<
: Less than, compares numerical or date values.<=
: Less than or equal, compares numerical or date values.in
: In, checks if a field contains a specific value.not in
: Not in, checks if a field doesn't contain a specific value.
value
: The metadata field value. When used within
andnot in
operators, the value can be a list.
Logic
Logic filters combine multiple conditions to form complex queries. They must contain the following keys:
operator
: A logical operator, must be one of the following (note that logical operators are capitalized):NOT
: The specified condition must not be met.AND
: All conditions must be met.OR
: At least one of the conditions must be met.
conditions
: A list of dictionaries defines the conditions that must be met. These dictionaries can be either of type comparison or logic.
Examples
Here's a simple comparison filter that checks if the value of a document's metadata field 'type"
is "article"
:
router:
type: haystack.components.routers.metadata_router.MetadataRouter
rules:
en: # Router's rules use filters:
field: meta.type
operator: ==
value: article
filters = {"field": "meta.type", "operator": "==", "value": "article"}
Here's a more complex filter combining both comparison and logic filters. For a document to be selected, all of the following conditions must be true, as they're combined using the AND
operator:
- The document's type must be "article".
- The document's date must be on or after January 1, 2015.
- The document's date must be before January 1, 2021.
- The document's rating must be 3 or higher.
- The document must either belong to the genres "economy" or "politics" or it must be published by "The New York Times".
retriever:
type: haystack.components.retrievers.filter_retriever.FilterRetriever
init_parameters:
document_store:
init_parameters:
use_ssl: True
verify_certs: False
http_auth:
- "${OPENSEARCH_USER}"
- "${OPENSEARCH_PASSWORD}"
type: haystack_integrations.document_stores.opensearch.document_store.OpenSearchDocumentStore
filters:
operator: "AND"
conditions:
- field: meta.type
operator: ==
value: "article"
- field: meta.date
operator: ==
value: 1420066800
- field: meta.date
operator: <
value: 1609455600
- field: meta.rating
operator: <
value: 3
- operator: "OR"
conditions:
- field: meta.genre
operator: "IN"
value: ["economy", "politics"]
- field: meta.publisher
operator: ==
value: "nytimes"
filters = {
"operator": "AND",
"conditions": [
{"field": "meta.type", "operator": "==", "value": "article"},
{"field": "meta.date", "operator": ">=", "value": "2015-01-01"},
{"field": "meta.date", "operator": "<", "value": "2021-01-01"},
{"field": "meta.rating", "operator": ">=", "value": 3},
{
"operator": "OR",
"conditions": [
{"field": "meta.genre", "operator": "in", "value": ["economy", "politics"]},
{"field": "meta.publisher", "operator": "==", "value": "nytimes"},
],
},
],
}
Components Using Filters
- MetadataRouter: You can use filters in the
rules
parameter. - FilterRetriever
- OpenSearchBM25Retriever
- OpenSearchEmbeddingRetriever
For details and usage examples, see the component's documentation page.
Legacy Filters
The syntax of filters in deepset Cloud v1.0 didn't require explicit keys for comparison filters and used operators preceded by $
(the dollar sign). The new syntax is more explicit and easier to understand.
We will continue to support the old syntax for now, but we encourage you to gradually migrate to the new filtering syntax.
Here's the mapping of operators in both versions:
v.1.0 | v2.0 |
---|---|
$eq | == |
$ne | != |
$gt | > |
$gte | > = |
$lt | < |
$lte | <= |
$in | in |
$nin | not in |
Comparison filters in v1.0 didn't explicitly indicate the field name, value, or operator. This is how the same filter expression looks like in both versions:
legacy_filter = {
"$and": {
"type": {"$eq": "article"},
"date": {"$gte": "2015-01-01", "$lt": "2021-01-01"},
"rating": {"$gte": 3},
"$or": {"genre": {"$in": ["economy", "politics"]}, "publisher": {"$eq": "nytimes"}},
}
}
current_filter = {
"operator": "AND",
"conditions": [
{"field": "meta.type", "operator": "==", "value": "article"},
{"field": "meta.date", "operator": ">=", "value": "2015-01-01"},
{"field": "meta.date", "operator": "<", "value": "2021-01-01"},
{"field": "meta.rating", "operator": ">=", "value": 3},
{
"operator": "OR",
"conditions": [
{"field": "meta.genre", "operator": "in", "value": ["economy", "politics"]},
{"field": "meta.publisher", "operator": "==", "value": "nytimes"},
],
},
],
}
Updated 18 days ago