Semantics: Parallel Derivation of Probabilistic Information Retrieval Models
Speaker: Thomas Roelleke
About the semantics of models.
Goal: figure out how to make abstract data models and languages for implementing IR models. (BIR / Language Modeling / Poisson Model are the three existing models.)
This was a very strange talk, that explained a paper they’d written. It had many fine visual aids, including dice describing the different sorts of models. The dice had different things on them for the different models.
BIR has a die for each term with one side for each document, the side has a one or zero on it, you judge probability that way.
LM has one die with a side for each word-position in the collection, with what word’s there.
PM is like BIR, but it involves numbers other than one or zero on a side?
But what’s the point, really? This is fairly obvious based on the normal expression of these models… what does this demonstrate:
1. BIR and PM assume the collection to be a set of non-relevant documents, the LM assumes you’re selecting terms from relevant documents.
2. Poisson Bridge: see the paper for this, I dropped out of applied math grad school for a reason.
3. BIR and LM deliver the same ranking if the BIR term weight == LM term weight. This seems unlikely to happen, but is true.
4. TF-IDF. If you apply the poisson bridge to BIR + LM, you get TF-IDF.
This was more an ‘interesting alternate view’ of probability than anything new. It was pretty enjoyable though.