Context-Sensitive Semantic Smoothing for the Language Modeling Approach to Genomic IR
Xiaohua Zhou, Xiaohua Hu, Xiaodan Zhang, Xia Lin, Il-Yeol Song
It is possible to disambiguate homonyms in a probabilistic manner by using Topic Signatures that let you identify which of the topics that the questionable-homonym is actually retrieving. Using Topic Signatures is also more effective for finding documents than the ‘bag of words’ model.
so “‘terms’ -> ‘Topics’ -> ‘find documents for topic’” is more effective for both precision and relevance than “‘terms’ -> ‘find documents for terms’” Doing this topic model is called ‘smoothing’ or ‘semantic smoothing.’
My reflection on this is that it’s a lot like using an automatically built controlled vocabulary for and mapping both documents and terms to this algorithmically. Strangely, this presentation reminds me a lot of a math-intensive version of Jens-Erik’s classes on Indexing, but, I suspect, only if you’ve already heard JEM talking and have that context.
It works better than WordNet (according to a person who asked a question), because it uses math to eliminate ambiguity of meaning.
Anyway: I plan to read the rest of their stuff. It looks interesting. It would be interesting to see what sort of ontologies (InfoSci sense) can work it with. However, nothing to do with Genomic IR I can see other than that’s probably the non-described domain they’re using.
Technorati Tags: folksonomy, semantics, sigir, sigir2006, subjective
-
heliometry ophionine augustinian pyrotechnic hyperelegant monocarpic overjade horst
Service Alternatives for Washington Inc.
http://www.stickgirl.com/

1 comment
Comments feed for this article
Trackback link: http://www.corprew.org/blog/2006/08/07/sigir06-formal-models-context-sensitive-semantic-smoothing-for-the-language-modeling-approach-to-genomic-ir/trackback/