Query pipelines

Query pipelines define the query execution and results ranking strategies used when searching the records in your collection. Steps in a query pipeline can be used for:

  • Query understanding - query rewrites, spelling, NLP, …

  • Filtering results - based on any attribute in the index. For example location or customer-specific results.

  • Changing the relevance logic - dynamically boost different aspects based on the search query, parameters or data models

  • Constructing the engine query - as opposed to the input query, the engine query is what is actually executed, it can be extremely complex

Commonly used steps

The following steps are the most commonly used ones. You will find many of them in the pipelines generated by Search.io.

Result settings

These steps configure how the results are returned (i.e. which page of results, which specific fields are returned for each result, and requesting other data that can be aggregated from the result set). All come with reasonable defaults, so little extra work is needed in most cases.

Filters, fields and pagination

- id: set-filter
- id: set-fields
- id: pagination

These steps enable the passing of variables with the search request to change set filters, change the fields that are being returned and customize the pagination, to change the number of results returned on each page.

Count aggregates

- id: count-aggregate
  params:
    fields:
      bind: countAggregate

Use aggregates to implement facets for categories, pricing or similar fields. The above example allows for the query to pass a count variable, defining the fields for which to count distinct values in the result set.

For example, to get a count of unique values in the field color matching the query, include the a count variable with the query:

{
  "q": "your query",
  // ...
  "count": "color"
}

The resulting json response will contain a list of aggregates with the count.

"aggregates": {
  "categories": {
    "count": {
      "Aquamarine": 6,
      "Blue": 3,
      "Crimson": 3,
      "Yellow": 6
    }
  }
}

Max and min aggregates

- id: min-aggregate
  params:
    fields:
      bind: min
- id: max-aggregate
  params:
    fields:
      bind: max

Max and min aggregates calculate the available range for a facet. Pricing or ratings are a common use-case here. Instead of showing arbitrary pricing ranges, they allow for tailoring the available range to the results of the query.

To calculate the minimum and maximum value for a price field for example, the following variables can be passed in with the query:

{
  "q": "your query",
  // ...
  "min": "price",
  "max": "price"
}

The resulting json response contains two values with the price of the cheapest and the most expensive product matching the query.

"aggregates": {
  "price": {
    "min": 89,
    "max": 299
  }
}

Aggregate filters

An aggregate filter is a pair, consisting of an aggregate and a filter. Aggregate filters change the result set to reflect the filters that are specified with them. An aggregate filter turns an aggregate into a filterable facet.

- id: count-aggregate-filter
  params:
    fields:
      bind: count
    filters:
      bind: countFilters     

To calculate the count of unique values in the field size and filter the result set by the color blue, one must define a count and a countFilters variable matching the fields in the pipeline step above.

{
    "variables": {
        "q": "t-shirt",
        "count": "size",
        "countFilters": "size = 'blue'", 
    }
}

Sorting

- id: sort
  params:
    fields:
      bind: sort

Use the sort step to sort results by a particular field. For example to sort by price, pass in a sort variable with the query.

{
  "q": "your query",
  // ...
  "sort": "price"
}

The sort order can be reversed by adding a "-" in front of the field name.

{
  "q": "your query",
  // ...
  "sort": "-price"
}

Language settings

Search.io's delivers great matching out of the box. But with a few tweaks to language specific settings, you can create results even more tailored to you and your business.

Index Spelling

- id: index-spelling
  params:
    model:
      constant: default
    phraseLabelWeights:
      constant: query:1.0,title:1.0
    text:
      bind: q

The index-spelling check performs spelling correction on the query. The phraseLabelWeights constants specify that previously entered queries (assigned to live training in step:train-autocomplete step) as well as the title field should have equal weight for spelling suggestions.

Synonyms

- id: synonym
  params:
    model:
      constant: collection_name
    text:
      bind: q

Synonyms are words or phrases that share the exact same meaning in the same language. For example car is a synonym of auto. This step will augment the query with synonyms defined in the collection and ensures that a search for car will also match documents that contain the word auto.

Index scoring

Steps in this category define what fields should be searched and what weight each of these fields should receive.

Reinforcement learning

- id: index-text-score-instance-boost
  params:
    minCount:
      constant: "5"
    threshold:
      constant: "0.5"

This is one of the most powerful steps in Search.io's pipelines. It adds a ML score boost to results with positive interactions and decreases the score of results with negative interactions. For the boost to come into effect, a minimum of 5 interactions is required.

Relying solely on textual matching leads to subpar results when there is ambiguity in language. by learning which results had positive interactions (clicked for a website or purchased for products), Search.io automatically improves the relevancy of results over time.

Index Text boost

- id: index-text-index-boost
  params:
    field:
      constant: description
    score:
      constant: "0.5000"
    text:
      bind: q

A pipeline typically has multiple index-text-index-boosts defined, one for each searchable field. The above step assigns the description a weight of 0.5. The weight should be assigned relative to the importance of the field and will contribute accordingly to the overall result score.

Feature scoring

Feature scoring can be used to fine tune the textual matching results of the index scoring. By taking business data like sales or margins into account, results can be promoted if they are more popular or have a bigger impact on the business. However, it's important to find the right balance between optimizing for business outcomes and accuracy of the textual matching. Since each individual business is different, it often takes some experimentation to get this right.

Filter boost

- id: filter-boost
  params:
    filter:
      constant: title ~ q
    score:
      constant: "0.05"

The above example works on the title, but works equally well on fields that are not searchable, such as margins or sales. In this example, results that contain the query text in the title will receive an additional boost of 50%. To search for exact matches instead of simply containing the query text, the "~" can be replaced with a "=".

Data training

Post steps in the pipeline are executed after the results are being returned from the index. Typically post steps can further change the order or insert additional results (like promotions) and train the data models in Search.io.

Promotions

- id: promotions
  params:
    text:
      bind: q

Promotions can be defined in the console. This step adds additional results for specific queries that match the promotion, even if those results don't appear in the regular result list.

Train autocomplete

- id: train-autocomplete
  params:
    label:
      constant: query
    model:
      constant: default
    text:
      bind: q

The last step in the pipeline takes the query text the user entered and trains the autocomplete model if the query successfully delivered results. This improves autocomplete suggestions over time based on your users search behavior.

Last updated