It sounds like a start. Can you open a JIRA and attach a patch? I
still am not sure if Lucene is totally the way to go on it. I suppose
eventually we need a way to put things in a common format like ARFF
and then just have transformers to it from other formats. Come to
think of it, maybe it makes sense to have a Tika ContentHandler that
can output ARFF or whatever other format we want. This would make
translating input docs dead simple.
Then again, maybe a real Pipeline is the answer. I know Solr, etc.
could benefit from one too, but that is a whole different ball of wax.
On May 28, 2009, at 10:32 AM, Shashikant Kore wrote:
Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)