Subject: can't get <point-id, cluster-id> thru "-p"


The -p parameter is an input. You should pass in the clusterPoints/
directory that was generated by the cluster driver you used.

My use of fkmeans might be an example:

    mahout fkmeans -i wikipedia-vectors/tfidf-vectors/ -c
    wikipedia-fkmeans-centroids -o wikipedia-fkmeans-clusters -k 100 -m
    2 -ow -x 10 -dm org.apache.mahout.common.distance.CosineDistanceMeasure

This will create
wikipedia-clusters/clusters/clusteredPoints/part-m-00000 which is the
file with the clustered points. I then did a clusterdump

    mahout clusterdump -s
    wikipedia-fkmeans-clusters/clusters/clusters-1/part-r-00000 -p
    wikipedia-fkmeans-clusters/clusteredPoints/ -d
    wikipedia-fkmeans-clusters/dictionary.file-0 -dt sequencefile -dm
    org.apache.mahout.common.distance.CosineDistanceMeasure

This will output to the screen. Use -o to specify an output file.

Good advice for any user of mahout is read the output of the help very
carefully. IMHO it is very easy to misunderstand the parameters, inputs,
and outputs. I think I only understand about 10%. Try:

    mahout fkmeans --help
On 3/14/12 10:52 AM, Baoqiang Cao wrote: