Subject: RE: Problems running examples


Moving to @dev

Hi Drew,

Don't know what is happening, but I did a clean unpack of the 0.5 distro, mvn install and ran build-reuters.sh. It downloaded the data but failed exactly as before. Both continue to run just fine on my trunk build since I updated yesterday. IIRC, they were both failing with trunk before 0.5 too.

On MapR:
[dev@devbox mahout-distribution-0.5]$ ./examples/bin/build-reuters.sh
Please select a number to choose the corresponding clustering algorithm
1. kmeans clustering
2. lda clustering
Enter your choice : 1
ok. You chose 1 and we'll use kmeans Clustering
Downloading Reuters-21578
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 7959k  100 7959k    0     0  1769k      0  0:00:04  0:00:04 --:--:-- 1788k
Extracting...
Running on hadoop, using HADOOP_HOME=/opt/mapr/hadoop/hadoop-0.20.2
HADOOP_CONF_DIR=/opt/mapr/hadoop/hadoop-0.20.2/conf.new
11/06/10 16:12:19 WARN driver.MahoutDriver: No org.apache.lucene.benchmark.utils.ExtractReuters.props found on classpath, will use command-line arguments only
Deleting all files in mahout-work/reuters-out-tmp
11/06/10 16:12:24 INFO driver.MahoutDriver: Program took 4085 ms
MAHOUT_LOCAL is set, running locally
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/dev/Desktop/mahout-distribution-0.5/examples/target/mahout-examples-0.5-job.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/dev/Desktop/mahout-distribution-0.5/examples/target/dependency/slf4j-jcl-1.6.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
Jun 10, 2011 4:12:25 PM org.slf4j.impl.JCLLoggerAdapter info
INFO: Command line arguments: {--charset=UTF-8, --chunkSize=5, --endPhase=2147483647, --fileFilterClass=org.apache.mahout.text.PrefixAdditionFilter, --input=mahout-work/reuters-out, --keyPrefix=, --output=mahout-work/reuters-out-seqdir, --startPhase=0, --tempDir=temp}
Exception in thread "main" java.io.IOException: No FileSystem for scheme: maprfs
        at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1375)
        at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66)
        at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1390)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:196)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:95)
        at org.apache.mahout.text.SequenceFilesFromDirectory.run(SequenceFilesFromDirectory.java:62)
        at org.apache.mahout.text.SequenceFilesFromDirectory.run(SequenceFilesFromDirectory.java:106)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
        at org.apache.mahout.text.SequenceFilesFromDirectory.main(SequenceFilesFromDirectory.java:81)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
        at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
        at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:187)
rmr: cannot remove mahout-work/reuters-out-seqdir: No such file or directory.
put: File mahout-work/reuters-out-seqdir does not exist.

And then, after changing HADOOP_HOME & HADOOP_CONF_DIR to CDH3 on a fresh untar/install of 0.5:
[dev@devbox mahout-distribution-0.5]$ ./examples/bin/build-reuters.sh
Please select a number to choose the corresponding clustering algorithm
1. kmeans clustering
2. lda clustering
Enter your choice : 1
ok. You chose 1 and we'll use kmeans Clustering
Downloading Reuters-21578
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 7959k  100 7959k    0     0  1707k      0  0:00:04  0:00:04 --:--:-- 1768k
Extracting...
Running on hadoop, using HADOOP_HOME=/usr/lib/hadoop
HADOOP_CONF_DIR=/usr/lib/hadoop/hadoop1.conf
11/06/10 16:29:42 WARN driver.MahoutDriver: No org.apache.lucene.benchmark.utils.ExtractReuters.props found on classpath, will use command-line arguments only
Deleting all files in mahout-work/reuters-out-tmp
11/06/10 16:29:45 INFO driver.MahoutDriver: Program took 3669 ms
MAHOUT_LOCAL is set, running locally
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/dev/Desktop/mahout-distribution-0.5/examples/target/mahout-examples-0.5-job.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/dev/Desktop/mahout-distribution-0.5/examples/target/dependency/slf4j-jcl-1.6.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
Jun 10, 2011 4:30:02 PM org.slf4j.impl.JCLLoggerAdapter info
INFO: Command line arguments: {--charset=UTF-8, --chunkSize=5, --endPhase=2147483647, --fileFilterClass=org.apache.mahout.text.PrefixAdditionFilter, --input=mahout-work/reuters-out, --keyPrefix=, --output=mahout-work/reuters-out-seqdir, --startPhase=0, --tempDir=temp}
Exception in thread "main" java.io.IOException: Call to hadoop1.eng.narus.com/172.31.2.200:8020 failed on local exception: java.io.EOFException
        at org.apache.hadoop.ipc.Client.wrapException(Client.java:775)
        at org.apache.hadoop.ipc.Client.call(Client.java:743)
        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
        at $Proxy0.getProtocolVersion(Unknown Source)
        at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:359)
        at org.apache.hadoop.hdfs.DFSC