Phrase training

This guide supplements our Configuring Autocomplete and Configuring Spelling guides

Why phrase training is important:

Spelling prioritises phrase based suggestions over individual word suggestions as phrases provide valuable context about what a user may be searching for.

As an example if a user types 'winter tihgts' then "winter thats, winter tints, winter tights, winter lights" are all valid spelling suggestions. However if your model has been trained with the phrases 'winter tights' and 'winter lights' then Spelling will only submit these which means:

  • Autocomplete has a much better chance returning suggestions to the user which reflect what they may be looking for

  • If the user ignores the Autocomplete suggestion and submits the search with the spelling mistake, the search will be processed with 'winter tihgts' and additionally 'winter tights' and 'winter lights' increasing the accuracy of the search results

If Spelling does not find a phrase suggestion then it will make suggestions based on those individual word spelling corrections which means:

  • Autocomplete still has a chance of finding phrases in its model, but the suggestions are less likely to be what the user intends to type as a search

  • If the user submits the search these suggestions are likely to provide less relevant results

To improve user experience and search result relevance we want Spelling to produce phrase based corrections for as many user spelling mistakes as possible. We also want Autocomplete to contain phrases which are likely to reflect what your users are searching for.

Your field selection and data is key to this.

Why your data is key

To maximise on phrase training your data needs to reflect what a user may type. Spelling and Autocomplete consider an entire field value a single phrase and require exact matching.

For example if phrase training had occurred on a field with a value Winter tightsthen 'winter tihgts' would be corrected to 'winter tights'. Now autocomplete is very likely to return this as a suggestion, but it also now has the chance to return other suggestions it may have been trained on like 'winter tights with extra insulation'. However if the training was on 'warm winter tights' then the following is likely to occur:

  • Spelling would fall back to individual word correction as the phrase 'warm winter tights' is not an exact match for the users query of 'winter tihgts'

  • Autocomplete will not return 'warm winter tights' as a suggestion as the users query does not containing the word 'warm'

  • Autocomplete may still return the suggestion 'winter tights with extra insulation' as 'tights' is likely to be one of the individual word suggestions made by Spelling

Suitable fields for phrase training

Fields with short phrases like akeywords field can be a good candidate for phrase training as it often contains phrases that reflect exactly what a user would type like 'queen quilt cover', 'reversible quilt cover'. So a user who mistypes types 'reversable' is likely to be corrected by Spelling to 'reversible quilt cover' and sent back to the user as an Autocomplete phrase suggestion.

The category field could be another good candidate with phrases like 'dressing tables' and 'computer desks'. As a final example the dir field (URL directory) can be a good candidate depending on your website hierarchy. It may produce phrases like 'vouchers-for-business' or 'outdoor-furniture'.

The brand field is also a good candidate for phrase training. This increases accuracy by submitting brand names that consist of more than one word as a phrase

Unsuitable fields for phrase training

Any field which contains a long bodies of text like the body field would be a poor candidate. For example a description field which consisted of:

The 'Extra Suede Winter Tights' provide luxury with timeless appeal while embracing your curves with confidence. These tights are velvety to touch and are mid-weight for winter warmth. The opaque to waist design allows these tights to be worn with any hemline

would be a poor candidate as it would be considered a single phrase and therefore its highly unlikely to reflect what a user would be typing or want their final query to be, so in this example the query of winter tihgts is not an exact phrase match and would trigger individual word suggestions.

Other field selection considerations

The title field often contains short phrases which reflect what a user may type making it a good candidate. However if your titles contain brand names i.e. 'jack wolfskin winter tights' then an exact match won't be made for a user whose query 'winter tihgts' does not contain the brand.

In circumstances like this is its recommended to use the title field for phrase training to accommodate users searching by brand.

Suitable fields for individual word training

Here a field value is broken up into individual words. Any field that contains individual words which a user may search with are good candidates for individual word training. Long bodies of text like the body or description field can be good candidates. Training on fields like title, keywords and brand that you may also have used for phrase training ensures the individual words can be used if the exact match phrase conditions aren't met

Last updated