Subject: How to find the k most similar docs

I'm using Mahout 0.6 compiled from source via 'mvn install' I used
Suneel's code below to get NumberOfColumns.

When I try to run the rowsimilarity job via:

    bin/mahout rowsimilarity -i wikipedia-clusters/tfidf-vectors/ -o
    /wikipedia-similarity -r 87325 -s SIMILARITY_COSINE -m 10  -ess true

I get the following error

    12/03/04 19:14:32 INFO common.AbstractJob: Command line arguments:
    {--endPhase=2147483647, --excludeSelfSimilarity=true,
    --maxSimilaritiesPerRow=10, --numberOfColumns=87325,
    --similarityClassname=SIMILARITY_COSINE, --startPhase=0, --tempDir=temp}
    2012-03-04 19:14:32.376 java[1090:1903] Unable to load realm info
    from SCDynamicStore
    12/03/04 19:14:33 INFO input.FileInputFormat: Total input paths to
    process : 1
    12/03/04 19:14:33 INFO mapred.JobClient: Running job: job_local_0001
    12/03/04 19:14:33 INFO mapred.MapTask: io.sort.mb = 100
    12/03/04 19:14:33 INFO mapred.MapTask: data buffer = 79691776/99614720
    12/03/04 19:14:33 INFO mapred.MapTask: record buffer = 262144/327680
    12/03/04 19:14:34 WARN mapred.LocalJobRunner: job_local_0001
    java.lang.ClassCastException: cannot be
    cast to
         at org.apache.hadoop.mapred.MapTask.runNewMapper(

The cast error (as I understand it) usually happens when you pass in a
classname incorrectly. This seems likely since coocurence similarity is
being used?

I've probably missed something obvious about how to pass in similarity
measure to use?
On 2/19/12 9:00 PM, Suneel Marthi wrote: