Brown and DiPietro's algorithm for clustering based on entropy is
somewhat infamous for the difficulty of achieving usable performance.
Mike Collins was responsible for a famously speedy version. Having
build one that is just barely fast enough in C++, I wouldn't recommend
trying it in Java. Of course, you aren't proposing that, just
recommending the bigram entropy metric or something like it.
On Mon, Jul 27, 2009 at 9:42 PM, Ted Dunning<[EMAIL PROTECTED]> wrote: