Variable manipulation

Introduction

The string-eval step can perform string based evaluations to transform data. It enables numeric calculations, casting to different data types, splits, and more.

The format will feel familiar to those with database and/or JavaScript experience.

An example use case may look like the following:

- id: string-eval
  params:
    param:
      bind: radius_miles
    expression:
      constant: 'FLOAT(radius_km)/1.6'
  condition: radius_km

This example takes an input variable radius_km, evaluates the expression, and then writes the output variable to the radius_miles parameter.

The condition ensures that the step only runs if the radius_km variable exists.

Alternatively, you can also write the output back to the same variable, which is useful if you want to augment the existing parameters. For example:

- id: string-eval
  params:
    param:
      bind: radius
    expression:
      constant: 'FLOAT(radius)/1.6'
  condition: radius

Variables are always assumed to be strings, so you need to convert them using INT() or FLOAT() casting to use them in mathematical expressions. See Casting below.

Referencing fields in record pipeline

When using the string-eval step in the record pipeline, you can read input or write output directly onto the field using "fields." prefix before the fieldname.

To transform a record using string-eval step before indexing, see the following example:

- id: string-eval
  params:
    field:
      constant: title
    expression:
      constant: 'TRIM(title)'
  condition: fields.title

The above step checks if the incoming record contains the 'title' field. The expression removes all trailing and leading whitespaces from the 'title' field. The output from the expression replaces the 'title' field.

Operators

The expression supports AND, OR and XOR operators as well as parentheses. The logic also supports the IF() function much like you would see in databases and/or Excel (see Conditional functions below).

Functions

This step supports a range of functions that can be used in the expression. These functions fall into several groups:

Casting

Convert a string into different data types. This is common for numeric operations.

INT(item)

Int attempts to convert the item to an integer. If the type is a float it will be rounded to the nearest integer.

FLOAT(item)

Float attempts to convert the item into a float.

STRING(item)

String converts the item to a string.

BOOL(item)

Bool attempts to convert the item to a Boolean. A zero will convert to FALSE, other numeric values will convert to TRUE. The string versions of TRUE and FALSE will convert as expected.

TIME(item)

Time attempts to convert the item to a time. The default time layout is 2006-01-02 15:04:05.

DURATION(item)

Duration attempts to convert the item to a duration. item should be a duration string, which is a signed sequence of decimal numbers, each with optional fraction and a unit suffix, such as "300ms", "-1.5h" or "2h45m". Valid time units are "ns" (nanosecond), "us" or "µs" (microsecond), "ms" (millisecond), "s" (second), "m" (minute), "h" (hour).

Conditional

IF(bool, item_if_true, item_if_false)

This takes a bool value (which could be the result of an expression), and returns item_if_true OR item_if_false for true/false values respectively.

Arrays

As previously shown above you can create and join lists using the bracket notation, e.g. [a]+[b] produces a list [a, b]

String functions

String functions allow for strings to be transformed, joined, quoted, trimmed and more.

JOIN(elements, separator)

Join concatenates the elements of its first argument to create a single string. The separator string defaults to "" if not set, and is placed between elements in the resulting string.

SPLIT(string, separator)

Split the string by separator, returning a list of strings.

QUOTE(item)

Quote a string, or list of strings. The returned string(s) uses escape sequences (\t, \n, \xFF, \u0100) for control characters and non-printable characters.

LENGTH(item)

Length returns the number of runes (UTF-8) if the item is a string, or the number of elements if the item is an array.

TRIM(string)

Trim returns a slice of the string with all leading and trailing Unicode code points representing whitespace.

LTRIM(string)

LTrim returns a slice of the string with all leading Unicode code points representing whitespace.

RTRIM(string)

RTrim returns a slice of the string with all trailing Unicode code points representing whitespace.

RIGHT(string, num_of_chars)

Right extracts a number of characters num_of_chars from string (starting from right)

LEFT(string, num_of_chars)

Left extracts a number of characters num_of_chars from string (starting from left)

SUBSTRING(string, start_offset, [num_of_chars])

Substring extracts a number of characters (num_of_chars) from a string (starting from start_offset). If num_of_chars is not provided, the rest of the string starting from start_offset is returned.

UPPER(string)

Upper converts string to upper-case.

LOWER(string)

Lower converts a string to lower-case.

Numeric functions

Numeric functions can be used to perform numeric transforms on inbound parameters.

POWER(base, exponent)

Power takes a base number and raises it to the power of the exponent.

ABS(number)

Abs returns the absolute value of number.

EXP(number)

Exp returns e raised to the power of number. The constant e (2.718281...), is the base of natural logarithms.

CEIL(number)

Ceil returns the smallest integer value that is larger than or equal to number.

FLOOR(number)

Floor returns the largest integer value that is smaller than or equal to number.

SQRT(number)

Sqrt returns the square root of number.

LN(number)

Ln returns the natural logarithm of number.

MIN(number1,number2)

Min returns the smaller of number1 and number2.

MAX(number1,number2)

Max returns the larger of number1 and number2.

Time functions

Time functions can be used to measure time.

NOW()

Returns the current time. (in UTC timezone)

SINCE(time)

Returns the time elapsed (duration) since time.

UNTIL(time)

Returns the length of time (duration) until time.

Geo functions

HAVERSINE(lat1,lng1,lat2,lng2)

Calculates the great circle distance in kilometers between points (lat1,lng1) and (lat2,lng2).

For example, to calculate the distance between Sydney (-33.865143, 151.209900) and Melbourne (-37.840935, 144.946457) and store the output in variable distance:

- id: string-eval
  params:
    param:
      bind: distance
    expression:
      constant: HAVERSINE(-33.865143,151.209900,-37.840935,144.946457)

If the coordinates are stored in variables then they need to be cast to a FLOAT:

- id: string-eval
  params:
    param:
      bind: distance
    expression:
      constant: HAVERSINE(FLOAT(sydneyLat),FLOAT(sydneyLng),FLOAT(melbourneLat),FLOAT(melbourneLng))

Examples

String concatenation

Much like JavaScript, strings can be concatenated using the + symbol. For example:

- id: string-eval
  params:
    param:
      bind: concat_output
    expression:
      constant: firstName + ' ' + lastName

The above function would take two inputs and concatenate them with a space in between.

You can also use the JOIN() function for this:

- id: string-eval
  params:
    param:
      bind: concat_output
    expression:
      constant: JOIN([firstName, middleName, lastName], " ")

The above function concatenates all variables (firstName,middleName,lastName) separated by a space. The variables are closed in brackets indicating that it is a list of elements that are being joined.

Combining filter expressions

You can also combine filters using variables that may not exist in the request. Take the below example, if the bathFilter and priceFilter are NULL and the bedsFilter = “beds = 3”, then the output will be filterExpr = “beds = 3”

- id: string-eval
  params:
    param:
      bind: filterExpr
    expression:
      constant: JOIN([bedsFilter, bathFilter, priceFilter], " AND ")

Much like JavaScript, if you don’t convert strings, operators may not do what you are expecting, e.g. a+b = 56 if a = “5” and b = “6” are both string values, you might instead want INT(a)+INT(b) = 11.

Calculating square root of a number

For example, you may want to calculate the square root of an input parameter (string):

- id: string-eval
  params:
    param:
      bind: my_number
    expression:
      constant: SQRT(FLOAT(my_number))
  condition: my_number

In the above example, the input my_number must exist for the step to execute. If it does, it is converted from a string to a float, then the square root is calculated and written back into the variable. You can also write the output back to the same variable if needed.

Advanced examples

Combining string-eval with other steps such as regular expressions allows highly complex logic to be compiled quickly. Below are some examples:

Filtering using multiple nested variants

A common situation in ecommerce is to filter on multiple variant fields. This can be done using the ARRAY_MATCH(expression) filter syntax where expression is a filter expression targeting multiple variant fields to ensure they all match the expression.

Consider a JSON record:

{
  "id" = "123",
  "product_name": "BC679 Premium Ceramic Disc Brake Pads"
  "year": ["2010,2006,2010"],
  "make": ["ford,toyota,ford"],
  "model": ["falcon,corolla,bronco"],
  ...
}

Now assume that the end-user makes the following search request:

{
    "q": "2005 ford bronco brake pads"
}

In this case, let's assume we have transformed the query into the following using either NLP or regular expression steps:

{
  "q": "brake pads",
  "yearFilter": "year = 2005",
  "makeFilter": "make = 'ford'",
  "modelFilter": "model = 'bronco'"
}

Now, we want to take all of these filter expressions and combine these to get this param:

"comboFilterExpr": "ARRAY_MATCH(make = 'ford' AND year = 2005 AND model = 'bronco')"

Using the above, we want to target the sub-variants of each record to ensure they match the filters. Below is one way to do this:

- id: string-eval
  params:
    expression:
      constant: STRING("ARRAY_MATCH(" + JOIN([makeFilter, yearFilter, modelFilter], " AND ") + ")")
    param:
      bind: comboFilterExpr
  condition: makeFilter OR yearFilter OR modelFilter

The above step uses string-eval to collapse the filters into a single ARRAY_MATCH() filter expression. The output from the above step will be:

"comboFilterExpr": "ARRAY_MATCH(make = 'ford' AND year = 2005 AND model = 'bronco')",

Now, we want to use this param to filter results. We can do this using the add-filter step.

- id: add-filter
  params:
    filter:
      bind: comboFilterExpr
  condition: comboFilterExpr

The output from the above step will be:

"filter": "ARRAY_MATCH(make = 'ford' AND year = 2005 AND model = 'bronco')"

Notes:

  • If none of makeFilter, yearFilter or modelFilter exist, the string-eval step would not work as expected, so we add the condition: makeFilter OR yearFilter OR modelFilter to the step to ensure at least one of these filter expressions exists

  • Assuming at least one filter exists, the string-eval step joins the filters, wraps them in an ARRAY_MATCH() function and binds the output to the comboFilterExpr variable

  • The add-filter step uses condition: comboFilterExpr to ensure the filter expression exists, if so it is added to the query

Selecting best-matched variant from multiple arrays

This post-step is used for choosing the best variant match within a record where product variants are stored in multiple arrays. Typical use case would be displaying the correct image based on a colour match in the colour array, e.g. a record contains colour array [“blue”, “green”, “red”], and another array with the image URLs that correspond to each colour based on index position. If a user searches “green” we want to return the green image in index position “1” so we display the correct image to the user.

This must be used in the Post-step as the index position must be written back to each record returned in the search response.

# this step is optional but useful to have to see which index position is being matched on
- id: string-eval
  params:
    expression:
      constant: IBM(fields.variant_colours,values.q)
    field:
      constant: colour_output
  condition: q != ''
  
# this step will check if there is an index match
# if matched then check the variant_images array at the matched index position
# take the value and write it to a new "image" field
# if no match return the first image in the image array
- id: string-eval
  params:
    expression:
      constant: IF((INT(IBM(fields.variant_colours,values.q)) >= 0), fields.variant_images[IBM(fields.variant_colours,values.q)],fields.variant_images[0])
    field:
      constant: image

Split values before records are indexed

The string-eval step can also be used in the record pipelines. A common use case for string-eval step in record pipelines is to split values before storing values on a repeated field using delimiter or separator.

Let's assume that you want to index a webpage that contains 'features' and 'category' fields which include multiple values, e.g.,

{
  "id" = "123",
  "title": "BC679 Premium Ceramic Disc Brake Pads"
  "features": "Gray, Copper-free, Reduced noise",
  "category": "Automotive|Cars|Brakes",
  ...
}

If this page was indexed, the value of features will be stored as one value, e.g. "Gray, Copper-free, Reduced noise", which is not ideal if you want to use these fields in facets.

To split out each feature and category item and store it as an array with items, you can use a string-eval step. See the example below:

- id: string-eval
  params:
    expression:
      constant: SPLIT(fields.features,', ')
    field:
      constant: features

The above step takes the 'features' value from the incoming record, splits all values using the separator ", ", and stores items as a list in the 'features' field. You can use the same step to split the values of the 'category' field using the delimiter "|". The prefix "fields." before the fieldname is required when you want to reference the field values rather than the variables.

Note that the output from the step is being added to the field directly rather than assigning it to a variable. To split the value from the 'features' field into a variable, use:

- id: string-eval
  params:
    expression:
      constant: SPLIT(fields.features,', ')
    param:
      bind: features

The output from the above step will be:

"features": ["Gray","Copper-free","Reduced noise"]

Last updated