Subject: Document Clustering


It sounds like a start.  Can you open a JIRA and attach a patch?   I  
still am not sure if Lucene is totally the way to go on it.  I suppose  
eventually we need a way to put things in a common format like ARFF  
and then just have transformers to it from other formats.  Come to  
think of it, maybe it makes sense to have a Tika ContentHandler that  
can output ARFF or whatever other format we want.  This would make  
translating input docs dead simple.

Then again, maybe a real Pipeline is the answer.  I know Solr, etc.  
could benefit from one too, but that is a whole different ball of wax.
On May 28, 2009, at 10:32 AM, Shashikant Kore wrote:

--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:
http://www.lucidimagination.com/search