In the previous post we looked at techniques to help us create and articulate more effective queries. From auto-complete for lookup tasks to auto-suggest for exploratory search, these simple techniques can often make the difference between success and failure.
But occasionally things do go wrong. Sometimes our information journey is more complex than we’d anticipated, and we find ourselves straying off the ideal course. Worse still, in our determination to pursue our original goal, we may overlook other, more productive directions, leaving us endlessly finessing a flawed strategy. Sometimes we are in too deep to turn around and start again.
Conversely, there are times when we may consciously decide to take a detour and explore the path less trodden. As we saw earlier, what we find along the way can change what we seek. Sometimes we find the most valuable discoveries in the most unlikely places.
However, there’s a fine line between these two outcomes: one person’s journey of serendipitous discovery can be another’s descent into confusion and disorientation. And there’s the challenge: how can we support the former, while unobtrusively repairing the latter? In this post, we’ll look at four techniques that help us keep to the right path on our information journey.
1. Did you mean
As we saw in the previous post, auto-complete and auto-suggest are two of the most effective ways to prevent spelling mistakes and typographic errors (i.e. instances where we know how to spell something, but enter it incorrectly). By completing partial queries and suggesting meaningful alternatives, they avoid the problem at source. But, inevitably, some mistakes will slip through.
Fortunately, there are a variety of coping strategies. One of the simplest is to use spell checking algorithms to compare queries against common spellings of each word. The figure below shows the results on Google for the query “expolsion”. This isn’t necessarily a ‘failed’ search as such (as it does return results), but the more common spelling “explosion” would return more a productive result set. Of course, without knowing our intent, Google can never know for sure whether this spelling was intentional, so it offers the alternative as a “Did you mean” suggestion at the top of the search results page. Interestingly, Google repeats the suggestion at the bottom of the page, but with a slightly longer wording: “Did you mean to search for”. This is a subtle clarification, but one that may reflect the user’s shift in attention at this point (from query to results).
Likewise, most major online retailers apply a similar strategy for dealing with potential spelling mistakes and typographic errors. Amazon and eBay both conservatively apply Did You Mean to queries such as “guitr”, faithfully passing on the results for this query but offering the alternative as a highlighted suggestion immediately above the search results. And in Amazon’s case, the results for the corrected spelling are appended immediately below those of the original query:
Search engines may be capable of many things, but one thing they cannot do is read minds: they can never know the user’s intent. For that reason, when faced with queries like those above, it is wise to keep some distance. Offer a gentle nudge, but leave the choice with the user.
However, there are times when it seems much more apparent that a spelling mistake has occurred. In these cases, we may not know for sure what the user’s intent was, but we can be fairly certain what it wasn’t. In these instances, auto-correction may be the most appropriate response. For example, consider a query for “expolson” on Google: this time, instead of applying a Did You Mean, it is auto-corrected to “explosion”. As before, a message appears above the results (“Showing results for”), but this time, the choice has been made for us:
It seems that this time Google is more confident that our query was unintended. Without knowing our intent, how can it determine this? (In case you’re wondering, it’s not simply by looking for relatively low numbers of results: “expolsion” returns ~135,000 results, and “exploson” returns ~222,000, yet the latter auto-corrected while the former is not.) The answer lies in what Google researchers refer to as the “Unreasonable Effectiveness of Data”: in this instance, the collective behaviour of millions of users. By mining user data for patterns of query reformulation, Google can determine that “exploson” is more likely to be corrected than “expolsion”. Knowing this, it applies the correction for us.
In fact, Google applies the same insight to the auto-suggest function we saw in our previous post: in addition to completions based on the prefix, it also returns potential spelling. This is particularly important in a mobile context, when accurate typing on small, handheld keyboards is so much more difficult:
These strategies make a significant difference to the experience of searching the web. However, for site search, such vast quantities of user data may not be so readily available. In this case, perhaps a simple numeric test could suffice: for zero results, apply an auto-correction; for greater than zero but less than some threshold (say 20 results), offer a Did You Mean.
3. Partial matches
The techniques of auto-correct and Did You Mean are ideal for detecting and repairing simple errors such as spelling mistakes in short queries. But the reality of keyword search is that many users over-constrain their search by entering too many keywords, rather than too few. This is particularly apparent when confronted with a zero results page: for many users, the natural reaction is to add further keywords to their query, thus compounding the problem.
In these cases, it no longer makes sense to replace the entire query in the manner of an auto-correct or Did You Mean, particularly if certain sections of it might have actually returned productive results on their own. Instead, we need a more sophisticated strategy that considers the individual keywords and can determine which particular permutations are likely to produce useful results.
Amazon provides a particularly effective implementation of this strategy. For example, a keyword search for “fender strat maple 1976 USA” finds no matching results. However, rather than returning a zero results page, Amazon returns a number of partial matches based on various keyword permutations. Moreover, by communicating the non-matching elements of the query (using strikethrough text), it gently guides us along the path to more informed query reformulation:
Although conceptually simple, solving the partial match problem is non-trivial: a query with N keywords actually has N factorial permutations, of which only a fraction will return useful results. So for just the single query above, there are in principle 120 variations to consider. In addition, out of all those variations, there is only space to present results for a handful, so they need to be chosen to reflect the diversity of the matching products while avoiding duplicate results.
A similar strategy can be seen at eBay, which also finds no results for the same query we tried on Amazon. Instead of a zero results page, we see a list of the partial matches with an invitation to select one of them (or to “try the search again with fewer keywords”). These are ordered using what’s known as quorum-level ranking (Salton, 1989), which sorts results according to the number of matching keywords. In other words, products matching four keywords (such as “fender strat maple USA”) are ranked above those containing three or fewer (such as “fender strat USA”).
Partial matches are a very effective way to facilitate the process of query reformulation, providing us with a clear direction to take along our information journey. Together with auto-correct and Did You Mean, they act as signposts that help us decide which of the many paths to take. But sometimes we may see something that motivates us to take a deliberate detour. Like the auto-suggest function we saw in the previous post, related searches provides us with the inspiration to embrace new ideas that we might not otherwise have considered.
4. Related searches
All the major web search engines offer support for related searches. Bing, for example, shows them in a panel to the left of the main results:
Google, by contrast, shows them on demand (via a link in the sidebar) as a panel above the main search results. Both designs differentiate between extensions to the query and reformulations: any keywords that are not part of the original query are rendered in bold:
Apart from providing inspiration, related searches can be used to help clarify an ambiguous query. For example, query on Bing for “apple” returns results associated mainly with the computer manufacturer, but the related searches clearly indicate a number of other interpretations:
Related searches can also be used to articulate associated concepts in a taxonomy. At eBay, for example, a query for “acoustic guitar” returns a number of related searches at varying levels of specificity. These include subordinate (child) concepts, such as “yamaha acoustic guitar” and “fender acoustic guitar”, along with sibling concepts such as “electric guitar”, and superordinate (parent) concepts such as “guitar”. These taxonomic signposts offer a subtle form of guidance, helping us understand better the conceptual space to which our query belongs.
While related searches offer us a way to open our minds to new directions; they are not the only source of inspiration. Sometimes it is the results themselves that provide the stimulus. When we find a particularly good match for our information need, we try to find more of the same: a process that Peter Morville refers to as “pearl growing” (Morville, 2010). Sometimes the action to find more of the same is one we can directly initiate: Google’s image search, for example, offers us the opportunity to find images similar to a particular result:
For image search, the results certainly appear impressive, with a single click returning a remarkably homogenous set of results. But that is perhaps also its biggest shortcoming: by hiding the details of the similarity calculation, the user has no control over what it returns, and cannot see why certain items are deemed similar when others are not. For this type of information need, a faceted approach may be preferable, in which the user has control over exactly which dimensions are considered as part of the similarity calculation.
While Google shows how we can actively seek similar results, sometimes we may prefer to have related content pushed to us. Recommender systems like Last.fm and Netflix rely heavily on attributes, ratings and collaborative filtering data to suggest content we’re likely to enjoy. And from just a single item in our music collection, iTunes Genius can recommend many more for us to listen to as part of a playlist:
Query reformulation is a key component of information seeking behaviour, and one where we benefit most from automated support. Did You Mean and auto-correct apply spell checking strategies to keep us on track. Partial matching strategies provide signposts toward more productive keyword combinations. And related searches can inspire us to consider new directions and grow our own pearls. Together, these four techniques keep us on track along our information journey.
- Salton, G. Automatic Text Processing: The Transformation, Analysis and Retrieval of Information by Computer. Addison-Wesley, Reading, MA, 1989.
- Alon Halevy, Peter Norvig, Fernando Pereira, “The Unreasonable Effectiveness of Data,” IEEE Intelligent Systems, vol. 24, no. 2, pp. 8-12, Mar./Apr. 2009
- Peter Morville, Search Patterns, O’Reilly Media, 2009.