Subject: Naive Bayes on Well Structured Data


This sounds great.  I would suggest you test the naive Bayes, complementary
Naive Bayes, SVM and SGD implementations.  Given that naive Bayes has worked
well on a sample, you will probably be very happy with SVM and SGD since
they handle very large cardinality well.

You will need to vectorize your input.  Since you have many columns, you may
want to look at Drew's document style stuff.  See
https://issues.apache.org/jira/browse/MAHOUT-274

There is the beginnings of some vectorization of hte sort you will need in
the SGD patch: http://issues.apache.org/jira/browse/MAHOUT-228  That also
has a learning system that will build your classifier using an on-line
logistic regression.

The SVM implementation is at http://issues.apache.org/jira/browse/MAHOUT-232

The NB and CNB implementations are in mahout itself already.

On Wed, Feb 17, 2010 at 1:58 PM, Jason Surratt <[EMAIL PROTECTED]>wrote:

--
Ted Dunning, CTO
DeepDyve