This sounds great. I would suggest you test the naive Bayes, complementary
Naive Bayes, SVM and SGD implementations. Given that naive Bayes has worked
well on a sample, you will probably be very happy with SVM and SGD since
they handle very large cardinality well.
You will need to vectorize your input. Since you have many columns, you may
want to look at Drew's document style stuff. Seehttps://issues.apache.org/jira/browse/MAHOUT-274
There is the beginnings of some vectorization of hte sort you will need in
the SGD patch: http://issues.apache.org/jira/browse/MAHOUT-228
has a learning system that will build your classifier using an on-line
The SVM implementation is at http://issues.apache.org/jira/browse/MAHOUT-232
The NB and CNB implementations are in mahout itself already.
On Wed, Feb 17, 2010 at 1:58 PM, Jason Surratt <[EMAIL PROTECTED]>wrote:
Ted Dunning, CTO