Subject: Re: Clustering techniques, tips and tricks


 Deploying a jar with a single class extending Analyzer results in an
error for a missing org.apache.lucene.analysis.Analyzer

    mahout seq2sparse -i wp-seqfiles/part-r-00000 -o wp-vectors -ow *-a
    com.custom.analyzers.LuceneStemmingAnalyzer* -chunk 100 -wt tfidf -s
    2 -md 3 -x 95 -ng 2 -ml 50 -seq -n 2
    MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
    Running on hadoop, using HADOOP_HOME=/usr/local/hadoop
    HADOOP_CONF_DIR=/usr/local/hadoop/conf
    MAHOUT-JOB:
    /usr/local/mahout/examples/target/mahout-examples-0.6-job.jar
    12/03/09 14:55:32 INFO vectorizer.SparseVectorsFromSequenceFiles:
    Maximum n-gram size is: 2
    12/03/09 14:55:33 INFO vectorizer.SparseVectorsFromSequenceFiles:
    Minimum LLR value: 50.0
    12/03/09 14:55:33 INFO vectorizer.SparseVectorsFromSequenceFiles:
    Number of reduce tasks: 1
    *Exception in thread "main" java.lang.NoClassDefFoundError:
    org/apache/lucene/analysis/Analyzer*
         at java.lang.ClassLoader.defineClass1(Native Method)
         at java.lang.ClassLoader.defineClass(ClassLoader.java:634)
         at
    java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
         at java.net.URLClassLoader.defineClass(URLClassLoader.java:277)
         at java.net.URLClassLoader.access$000(URLClassLoader.java:73)

It seems to be finding my custom lucene analyzer but not the abstract class?

If I go back to the WhiteSpaceAnalyzer all is well

    mahout seq2sparse -i wp-seqfiles/part-r-00000 -o wp-vectors -ow -a
    org.apache.lucene.analysis.WhitespaceAnalyzer -chunk 100 -wt tfidf
    -s 2 -md 3 -x 95 -ng 2 -ml 50 -seq -n 2

org.apache.lucene.analysis.Analyzer and
org.apache.lucene.analysis.WhitespaceAnalyzer are in the same jar so I'm
confused why it is finding one and not the other?

The same code seems to work on my laptop, so my deployment environment
is missing something? Any ideas?

On 3/7/12 1:24 AM, Abbas wrote: