Filter expressions
Last updated
Last updated
Filter expressions are a crucial concept in Sajari. Not only are they used as part of the search query to narrow the results. They are also used to boost specific results that match a filter expression or execute certain boosts only if a filter condition matches.
The most basic filter expression consists of 3 parts.
Field/Parameter
Operator
Value
The following expression matches if:
search query (parameter = q
)
contains (operator = ~
)
the word (value = star
)
Fields refer to the schema fields available on the record. For example title
, description
, brand
, category
, or price
.
Parameters refers to the parameters passed via the search query. This could be the query itself q
, applied filters (like brand
or price
), or personalisation information like location
, gender
, or membership
.
Whether a field or a parameter (or both) are available in the filter expression depends on the context. Conditions, which specify whether a certain boost is applied, can only act on query parameters. Boost filters on the other hand can refer to both, fields on the record as well as parameters passed into the query.
Filters expressions can utilise a powerful set of operators. The operators available depend on the type of field/parameter. For example Contains (~
) can operate on text fields, where Greater Than (>
) can only operate on numeric fields.
When using advanced editing, all values must be enclosed in single quotation marks, i.e. "field boost must be greater than 10" is written as boost>'10'
.
Below is a full list of available operators.
Equal To (=
)
Field is equal to a value (numeric or string)
dir1='blog'
Not Equal To (!=
)
Field is not equal to a value (numeric or string)
dir1!='blog'
Greater Than (>
)
Field is greater than a numeric value
boost>'10'
Greater Than Or Equal To (>=
)
Field is greater than or equal to a numeric value
boost>='10'
Less Than (<
)
Field is less than a given numeric value
boost<'50'
Less Than Or Equal To (<=
)
Field is less than or equal to a given numeric value
boost<'50'
Begins With (^
)
Field begins with a string
dir1^'bl'
Ends With ($
)
Field ends with a string
dir1$'og'
Contains (~
)
Field contains a string
dir1~'blog'
Does Not Contain (!~
)
Field does not contain a string
dir1!~'blog'
The Contains (~
) and Does Not Contain (!~
) operators can be used to filter values in an array. The following example shows a filter that returns all records with the colour red
stored stored in a color array field.
color
Yes
red, blue, white
color ~ ['red']
It's also possible to build more complex filters by combining field filter expressions with AND
/OR
operators, and brackets.
AND
Both expressions must match
dir1='blog' AND domain='www.search.io'
OR
One expression must match
dir1='blog' OR domain='blog.search.io'
For example, to match pages with language set to en
on www.search.io
or any page within the en.search.io
domain:
Some filters are difficult to express in boolean logic. For these there are filter functions that are utilised to create filters for you. They can also be part of larger boolean expressions.
Returns TRUE
if field
is NULL
Returns TRUE
if field
is NOT NULL
The NOT IN/IN
function is shorthand for multiple OR
conditions.
Returns TRUE
if the field
value on a record is equal to value1
, value2
or value3
or any other additional values defined within a list.
Returns TRUE
if the field
on a record is not equal to value1
, value2
or value3
or any other additional values defined within a list.
The NOT IN/IN
function works only on single value fields.
Returns TRUE
if the input geopoint lat_var, lng_var
is within the haversine radius (in kilometres) of the latitude, longitude
geopoint on the record.
Note: there is also a geo_boost
step that can be used to boost results based on their geo distance as opposed to filtering as per above.
Returns TRUE
if the timestamp field
is equal to or greater than the current time (UTC) minus the defined duration
.
A duration string is a signed sequence of decimal numbers, each with optional fraction and a unit suffix, such as "300ms", "-1.5h" or "2h45m". Valid time units are "ns" (nanoseconds), "us" or "µs" (microsecond), "ms" (millisecond), "s" (second), "m" (minute), "h" (hour).
For example, to filter records with a timestamp field
greater than yesterday, use SINCE_NOW(field, '24h')
. This can be beneficial to filter records based on recency such as published date.
Returns TRUE
for records where repeated fields have an offset matching the expression
.
For example, if a record had 3 variants of a product with varying price, color and size. These variants could be stored in three array based fields and the ARRAY_MATCH()
filter function could be used to evaluate each offset in these fields collectively as if they were singular. To illustrate this, if we had the following product indexed:
title
No
Air jordan shoes
color
Yes
red, red, white
size
Yes
13, 14, 13
price
Yes
122.00, 122.00, 130.00
If we were to search using the filter function ARRAY_MATCH(size = '14' AND color = 'white')
, this would look at each offset in the size
and color
arrays respectively and evaluate the values like they were singular. In this case each of the offsets does not match the filter and FALSE
would be returned for this function when evaluating this record.
This filter function is a powerful way to handle variants. Some common examples are price variations (i.e. volume based price breaks) and automotive parts (parts can match many makes and models).