Subject: Log-likelihood ratio test as a probability


I think that this is a really bad thing to do.

The LLR is really good to find interesting things.  Once you have done
that, directly using the LLR in any form to produce a weight reduces the
method to something akin to Naive Bayes.  This is bad generally and very,
very bad in the cases of smal counts.

Typically LLR works extremely well when you use it as a filter only and
then use som global measure to compute a weight.  See the Luduan method [1]
for an example.  The use of a text retrieval engine to implement a search
engine such as I have been lately nattering about much too much is another
example.    A major reason that such methods work so unreasonably well is
that they don't make silly weighting decisions based on very small counts.
 It is slightly paradoxical that looking at global counts rather than
counts specific so the cases of interest produce much better weights, but
the empirical evidence is pretty over-whelming.

Aside from such practical considerations, there is the fact that converting
a massive number of frequentist p values into weight is either outright
heresy (from the frequentist point of view) or simply nutty (from the
Bayesian point of view).

In any case, I have never been able get more than one bit of useful
information from an LLR score.  That one bit is extremely powerful, but
getting more seems to be a very bad idea.
[1] http://arxiv.org/abs/1207.1847 chapter 7, espoecially
On Thu, Jun 20, 2013 at 10:41 AM, Dan Filimon
<[EMAIL PROTECTED]>wrote: