Filter expressions
Filter expressions are a crucial concept in Sajari. Not only are they used as part of the search query to narrow the results. They are also used to boost specific results that match a filter expression or execute certain boosts only if a filter condition matches.
Structure of a filter expression
The most basic filter expression consists of 3 parts.
Field/Parameter
Operator
Value
Example
The following expression matches if:
search query (parameter =
q
)contains (operator =
~
)the word (value =
star
)
Fields and parameters
Fields refer to the schema fields available on the record. For example title
, description
, brand
, category
, or price
.
Parameters refers to the parameters passed via the search query. This could be the query itself q
, applied filters (like brand
or price
), or personalisation information like location
, gender
, or membership
.
Whether a field or a parameter (or both) are available in the filter expression depends on the context. Conditions, which specify whether a certain boost is applied, can only act on query parameters. Boost filters on the other hand can refer to both, fields on the record as well as parameters passed into the query.
Operators
Filters expressions can utilise a powerful set of operators. The operators available depend on the type of field/parameter. For example Contains (~
) can operate on text fields, where Greater Than (>
) can only operate on numeric fields.
When using advanced editing, all values must be enclosed in single quotation marks, i.e. "field boost must be greater than 10" is written as boost>'10'
.
Below is a full list of available operators.
Operator | Description | Example |
---|---|---|
Equal To ( | Field is equal to a value (numeric or string) |
|
Not Equal To ( | Field is not equal to a value (numeric or string) |
|
Greater Than ( | Field is greater than a numeric value |
|
Greater Than Or Equal To ( | Field is greater than or equal to a numeric value |
|
Less Than ( | Field is less than a given numeric value |
|
Less Than Or Equal To ( | Field is less than or equal to a given numeric value |
|
Begins With ( | Field begins with a string |
|
Ends With ( | Field ends with a string |
|
Contains ( | Field contains a string |
|
Does Not Contain ( | Field does not contain a string |
|
Filtering arrays
The Contains (~
) and Does Not Contain (!~
) operators can be used to filter values in an array. The following example shows a filter that returns all records with the colour red
stored stored in a color array field.
Field | Array | Values | Example |
---|---|---|---|
color | Yes | red, blue, white |
|
Combining expressions
It's also possible to build more complex filters by combining field filter expressions with AND
/OR
operators, and brackets.
Operator | Description | Example |
---|---|---|
| Both expressions must match |
|
| One expression must match |
|
For example, to match pages with language set to en
on www.search.io
or any page within the en.search.io
domain:
Filter functions
Some filters are difficult to express in boolean logic. For these there are filter functions that are utilised to create filters for you. They can also be part of larger boolean expressions.
Checking for existence or non-existence
Returns TRUE
if field
is NULL
Returns TRUE
if field
is NOT NULL
Filtering with multiple values
The NOT IN/IN
function is shorthand for multiple OR
conditions.
Returns TRUE
if the field
value on a record is equal to value1
, value2
or value3
or any other additional values defined within a list.
Returns TRUE
if the field
on a record is not equal to value1
, value2
or value3
or any other additional values defined within a list.
The NOT IN/IN
function works only on single value fields.
Filtering based on geo distance
Returns TRUE
if the input geopoint lat_var, lng_var
is within the haversine radius (in kilometres) of the latitude, longitude
geopoint on the record.
Note: there is also a geo_boost
step that can be used to boost results based on their geo distance as opposed to filtering as per above.
Time based filtering
Returns TRUE
if the timestamp field
is equal to or greater than the current time (UTC) minus the defined duration
.
A duration string is a signed sequence of decimal numbers, each with optional fraction and a unit suffix, such as "300ms", "-1.5h" or "2h45m". Valid time units are "ns" (nanoseconds), "us" or "µs" (microsecond), "ms" (millisecond), "s" (second), "m" (minute), "h" (hour).
For example, to filter records with a timestamp field
greater than yesterday, use SINCE_NOW(field, '24h')
. This can be beneficial to filter records based on recency such as published date.
Handling sub variants
Returns TRUE
for records where repeated fields have an offset matching the expression
.
For example, if a record had 3 variants of a product with varying price, color and size. These variants could be stored in three array based fields and the ARRAY_MATCH()
filter function could be used to evaluate each offset in these fields collectively as if they were singular. To illustrate this, if we had the following product indexed:
Field | Array | Values |
---|---|---|
title | No | Air jordan shoes |
color | Yes | red, red, white |
size | Yes | 13, 14, 13 |
price | Yes | 122.00, 122.00, 130.00 |
If we were to search using the filter function ARRAY_MATCH(size = '14' AND color = 'white')
, this would look at each offset in the size
and color
arrays respectively and evaluate the values like they were singular. In this case each of the offsets does not match the filter and FALSE
would be returned for this function when evaluating this record.
This filter function is a powerful way to handle variants. Some common examples are price variations (i.e. volume based price breaks) and automotive parts (parts can match many makes and models).
Last updated