Spelling

Almost half of the users who can’t find what they need on their first search will abandon a site immediately.

Search.io’s spelling correction is designed to help users find the results they are looking for from the first search, even if they have not spelled their query terms correctly.

Spelling correction is available on all plans. In this section, we will explain how spelling correction works, and how you can customize it.

How does spelling work?

Search.io builds a custom spelling model based on the words and phrases from a selection of fields stored in your records. This ensures that spelling correction works for custom terms, like brand names, that aren’t in a standard dictionary.

There are two scenarios where your spelling model is used to analyse user input and provide alternative word and phrase suggestions:

  1. As a user types a search query what they are typing and any spelling suggestions from your spelling model are processed by autocomplete (if active).

  2. When a user submits their final search query spelling suggestions are also generated. Each is given a weight indicating how confident the system is that the spelling suggestion is a good alternative. The search is processed with the user’s final query and the spelling suggestions.

How are spelling suggestions determined?

Spelling suggestions are determined on a combination of the following:

Word edit distance

Edit distance quantifies how dissimilar words are to one another by calculating the number of steps to transform one word into another. Words up to 4 characters have a 1 edit distance, whilst words with more than 4 characters have a 2 edit distance.

Examples of transformations that have one edit distance from the incorrect word to the correct word:

  • bke -> bike - missing letter in a word

  • boke -> bike - letter substitution

  • Biike -> bike - letter deletion

  • Bkie -> bike - has a letter swap

  • handcream -> hand cream - word splitting

  • book store -> bookstore - word combination

Split or combination transformations will not be performed in combination with other transformations.

Examples:

  • 'bok store' will not transform to 'bookstore' it could instead be suggested as 'book store' or 'box store'

  • 'handcreme' will not transform to 'hand cream'

Examples of transformations that have two edit distance from the incorrect word to the correct word:

  • vecile -> vehicle - missing letter and letter swap

  • maintainance -> maintenance - letter substitution and letter deletion

'Colleague' would not be suggested for 'colege' as it requires 3 edit transformations

Phrase edit distance

Spelling separately analyses your spelling model to make phrase suggestions. Phrase suggestions have a maximum edit distance of 2. The whole phrase must be found in the spelling model

Example of transformations:

  • comput desk -> computer desk - 2 missing letters in a word

  • compute dek -> computer desk - 1 missing letter in each word

  • comput dek -> No phrase suggestion due to exceeding edit distance

Phrase suggestions increase the accuracy of a spelling suggestion. For example a user query of "chep head phones" could be processed as "cheap headphones". If only single word spelling suggestions were considered then all of these suggestions would be valid "chip head phones, chef head phones, chew head phones" etc.

For further information read our Phrase training guide.

Spelling correction takes a probabilistic approach to spelling as an incorrect spelling correction can be harmful to a business. In this instance the customer’s product data may contain both the correct and incorrect spelling of “headphones” or potentially there is a new brand called “chep” on the site with only a few products.

Frequency

Words and phrases must be found at least 5 times in your data before they are used as candidates for spelling suggestions.

Frequency is also used to help determine probability. For example if you have an ecommerce store that sells bicycles, and a customer enters the query 'clok". Your store sells a lot of bike locks but few bike clocks. As 'lock' is likely to have a higher frequency it will take priority over a spelling suggestion of 'clock'

Spelling suggestion submission

Spelling will submit the top 5 phrase suggestions. If no phrase suggestions are found then the top 5 word suggestions will be submitted. These are submitted in addition to the users original query lowering the chance of returning zero results and frustrating the user. Following on from our previous example "chep head phones" could be submitted as:

chep head phones
cheap headphones
cheap head phones

Implementing and customising the spelling system

Read the Configuring spelling guide to learn how to train and customise your spelling model

Spelling is automatically configured for website collections and requires no additional set up. It's trained on the words and phrases within your title, keywords, description and body HTML tags. Read the Configuring spelling guide if you would like to alter these.

You can also add words and phrases from custom fields to your spelling model. To do this follow the Adding custom fields docs to set up the fields and then follow the pipeline documentation to configure spelling.

Last updated