Unearth is a customisable cognitive search solution that extends Microsoft’s Azure Search service using Cognitive Services, Machine Learning and the Cognitive Toolkit (CNTK). This demo shows how adding contextual understanding can improve search results.

We have ingested several works by Lewis Carroll and developed/trained a simple contextual model that understands this content.

If you search for ‘monster’ in the scope of ‘All Knowledge’ (no specific contextual model) you will get hits on 4 pages from ‘Through the Looking Glass’.

If you try the same search but in the Lewis Carroll corpus (same content but with a contextual model) you get hits on 10 pages in ‘Through the Looking Glass’ as well as a picture of the Jabberwocky and several pages in two (identical) versions of ‘The Hunting of the Snark’*.

You can see that the contextual model can extend the concept of 'monster' in Lewis Carroll to include several mythical creatures.

The context also has a thesaurus that can provide synonyms. We make use of Azure Search’s inbuilt Microsoft Index Analyzer which provides support for synonyms in 50 languages so ‘mouse’ will find hits on both ‘mouse’ and ‘mice’, but the Lewis Carroll context's thesaurus also adds ‘dormouse’.

Support for abbreviations – whose meaning may change from context to context – is also important. In some contexts, ‘TEA’ could be an abbreviation while ‘tea’ is a drink. There aren’t a lot of abbreviations in Lewis Carroll but the model has been taught that ‘TEA’ (capitalisation counts) stands for ‘Mad Tea Party’.

The Unearth solution has a number of parts, as illustrated in the diagram below. This demo is a simple* web app that exercises the Unearth Knowledge Discovery API. We typically work with customers to build customised contexts and UIs for specific problem domains. See the main website for more information.

* There are two ‘Snarks’ to show how user-assigned document importance ratings impact the search results.
* Simple, but if you click on the ‘gears’ you can get more insight into the query parse/search process and try options like turning off cognitive query analysis and cognitive search which gives you…Azure Search…