genre vs. aboutness
One of the problems I’m running into a bit with the whole cuneiverse whokno.ws project is distinguishing between discussions about ‘movies’ and discussions about particular movies.
You can currently see this in the whokno.ws test category ‘Shrek,’ which pretty much exists to help test the difference between talking about the concept of movies in general and talking about particular movies.
There’s an algorithm in the whokno.ws core that distinguishes between unrelated concepts with similar written representation, distinguishing between, say China the place and China the porcelain. So, say “China (porcelain) exports from China (country)” still gets a little rocky, but “China (porcelain) exports from, say, Ireland” gets figured out pretty handily. Ideally, though, these concepts are automatically distinguished between and only rarely will a document turn up positive for both of them.
Whokno.ws uses — in its non-trivial web dump form — a SKOS concept map linked to large numbers of data files, term lists, and other impedimenta. A lot of this impedimenta is generated automatically, but some of it isn’t. It isn’t an ontology in the most formal sense of the word, although it is going to approach that in september as a source of incredibly accurate information gets mined somewhat to fill out some of the nascent semantic web bits.
The nascent semantic web bits will enable me to accomplish the original goal of whokno.ws pretty easily.
Of course, this would be more impressive if there wasn’t a bunch of bad data in today’s test run, probably causing it to be much… less relevant than usual.