Variable manipulation
Introduction
The string-eval
step can perform string based evaluations to transform data. It enables numeric calculations, casting to different data types, splits, and more.
The format will feel familiar to those with database and/or JavaScript experience.
An example use case may look like the following:
This example takes an input variable radius_km
, evaluates the expression, and then writes the output variable to the radius_miles
parameter.
The condition ensures that the step only runs if the radius_km
variable exists.
Alternatively, you can also write the output back to the same variable, which is useful if you want to augment the existing parameters. For example:
Variables are always assumed to be strings, so you need to convert them using INT()
or FLOAT()
casting to use them in mathematical expressions. See Casting below.
Referencing fields in record pipeline
When using the string-eval step in the record pipeline, you can read input or write output directly onto the field using "fields." prefix before the fieldname.
To transform a record using string-eval step before indexing, see the following example:
The above step checks if the incoming record contains the 'title' field. The expression removes all trailing and leading whitespaces from the 'title' field. The output from the expression replaces the 'title' field.
Operators
The expression supports AND
, OR
and XOR
operators as well as parentheses. The logic also supports the IF()
function much like you would see in databases and/or Excel (see Conditional functions below).
Functions
This step supports a range of functions that can be used in the expression. These functions fall into several groups:
Casting
Convert a string into different data types. This is common for numeric operations.
Int attempts to convert the item to an integer. If the type is a float it will be rounded to the nearest integer.
Float attempts to convert the item into a float.
String converts the item to a string.
Bool attempts to convert the item to a Boolean. A zero will convert to FALSE
, other numeric values will convert to TRUE
. The string versions of TRUE
and FALSE
will convert as expected.
Time attempts to convert the item
to a time. The default time layout is 2006-01-02 15:04:05
.
Duration attempts to convert the item
to a duration. item
should be a duration string, which is a signed sequence of decimal numbers, each with optional fraction and a unit suffix, such as "300ms", "-1.5h" or "2h45m". Valid time units are "ns" (nanosecond), "us" or "µs" (microsecond), "ms" (millisecond), "s" (second), "m" (minute), "h" (hour).
Conditional
This takes a bool value (which could be the result of an expression), and returns item_if_true
OR item_if_false
for true/false values respectively.
Arrays
As previously shown above you can create and join lists using the bracket notation, e.g. [a]+[b]
produces a list [a, b]
String functions
String functions allow for strings to be transformed, joined, quoted, trimmed and more.
Join concatenates the elements
of its first argument to create a single string. The separator
string defaults to ""
if not set, and is placed between elements in the resulting string.
Split the string by separator, returning a list of strings.
Quote a string, or list of strings. The returned string(s) uses escape sequences (\t, \n, \xFF, \u0100) for control characters and non-printable characters.
Length returns the number of runes (UTF-8) if the item is a string
, or the number of elements if the item is an array.
Trim returns a slice of the string
with all leading and trailing Unicode code points representing whitespace.
LTrim returns a slice of the string
with all leading Unicode code points representing whitespace.
RTrim returns a slice of the string
with all trailing Unicode code points representing whitespace.
Right extracts a number of characters num_of_chars
from string
(starting from right)
Left extracts a number of characters num_of_chars
from string
(starting from left)
Substring extracts a number of characters (num_of_chars
) from a string
(starting from start_offset
). If num_of_chars
is not provided, the rest of the string
starting from start_offset
is returned.
Upper converts string
to upper-case.
Lower converts a string to lower-case.
Numeric functions
Numeric functions can be used to perform numeric transforms on inbound parameters.
Power takes a base
number and raises it to the power of the exponent
.
Abs returns the absolute value of number
.
Exp returns e
raised to the power of number
. The constant e
(2.718281...), is the base of natural logarithms.
Ceil returns the smallest integer value that is larger than or equal to number
.
Floor returns the largest integer value that is smaller than or equal to number
.
Sqrt returns the square root of number
.
Ln returns the natural logarithm of number
.
Min returns the smaller of number1
and number2
.
Max returns the larger of number1
and number2
.
Time functions
Time functions can be used to measure time.
Returns the current time. (in UTC timezone)
Returns the time elapsed (duration) since time
.
Returns the length of time (duration) until time
.
Geo functions
Calculates the great circle distance in kilometers between points (lat1,lng1)
and (lat2,lng2)
.
For example, to calculate the distance between Sydney (-33.865143, 151.209900) and Melbourne (-37.840935, 144.946457) and store the output in variable distance
:
If the coordinates are stored in variables then they need to be cast to a FLOAT
:
Examples
String concatenation
Much like JavaScript, strings can be concatenated using the +
symbol. For example:
The above function would take two inputs and concatenate them with a space in between.
You can also use the JOIN()
function for this:
The above function concatenates all variables (firstName,middleName,lastName) separated by a space. The variables are closed in brackets indicating that it is a list of elements that are being joined.
Combining filter expressions
You can also combine filters using variables that may not exist in the request. Take the below example, if the bathFilter
and priceFilter
are NULL
and the bedsFilter = “beds = 3”
, then the output will be filterExpr = “beds = 3”
Much like JavaScript, if you don’t convert strings, operators may not do what you are expecting, e.g. a+b = 56
if a = “5”
and b = “6”
are both string
values, you might instead want INT(a)+INT(b) = 11
.
Calculating square root of a number
For example, you may want to calculate the square root of an input parameter (string):
In the above example, the input my_number
must exist for the step to execute. If it does, it is converted from a string to a float, then the square root is calculated and written back into the variable. You can also write the output back to the same variable if needed.
Advanced examples
Combining string-eval
with other steps such as regular expressions allows highly complex logic to be compiled quickly. Below are some examples:
Filtering using multiple nested variants
A common situation in ecommerce is to filter on multiple variant fields. This can be done using the ARRAY_MATCH(expression)
filter syntax where expression
is a filter expression targeting multiple variant fields to ensure they all match the expression.
Consider a JSON record:
Now assume that the end-user makes the following search request:
In this case, let's assume we have transformed the query into the following using either NLP or regular expression steps:
Now, we want to take all of these filter expressions and combine these to get this param:
Using the above, we want to target the sub-variants of each record to ensure they match the filters. Below is one way to do this:
The above step uses string-eval
to collapse the filters into a single ARRAY_MATCH()
filter expression. The output from the above step will be:
Now, we want to use this param to filter results. We can do this using the add-filter
step.
The output from the above step will be:
Notes:
If none of
makeFilter
,yearFilter
ormodelFilter
exist, the string-eval step would not work as expected, so we add thecondition: makeFilter OR yearFilter OR modelFilter
to the step to ensure at least one of these filter expressions existsAssuming at least one filter exists, the
string-eval
step joins the filters, wraps them in anARRAY_MATCH()
function and binds the output to thecomboFilterExpr
variableThe
add-filter
step usescondition: comboFilterExpr
to ensure the filter expression exists, if so it is added to the query
Selecting best-matched variant from multiple arrays
This post-step is used for choosing the best variant match within a record where product variants are stored in multiple arrays. Typical use case would be displaying the correct image based on a colour match in the colour array, e.g. a record contains colour array [“blue”, “green”, “red”], and another array with the image URLs that correspond to each colour based on index position. If a user searches “green” we want to return the green image in index position “1” so we display the correct image to the user.
This must be used in the Post-step as the index position must be written back to each record returned in the search response.
Split values before records are indexed
The string-eval step can also be used in the record pipelines. A common use case for string-eval step in record pipelines is to split values before storing values on a repeated field using delimiter or separator.
Let's assume that you want to index a webpage that contains 'features' and 'category' fields which include multiple values, e.g.,
If this page was indexed, the value of features will be stored as one value, e.g. "Gray, Copper-free, Reduced noise", which is not ideal if you want to use these fields in facets.
To split out each feature and category item and store it as an array with items, you can use a string-eval step. See the example below:
The above step takes the 'features' value from the incoming record, splits all values using the separator ", ", and stores items as a list in the 'features' field. You can use the same step to split the values of the 'category' field using the delimiter "|". The prefix "fields." before the fieldname is required when you want to reference the field values rather than the variables.
Note that the output from the step is being added to the field directly rather than assigning it to a variable. To split the value from the 'features' field into a variable, use:
The output from the above step will be:
Last updated