Subject: Upgrade to Spark 1.2.1 using Guava


The root Spark pom has guava set at a certain version number. It’s very hard to read the shading xml. Someone suggested that I try using userClassPathFirst but that sounds too heavy handed since I don’t really care which version of guava I get, not picky.

When I set my project to use the same version as Spark I get a missing classdef, which usually means a version conflict.

At this point I am quite confused about what is actually in Spark as far as Guava and how to coexist with it happily.

Let me rephrase my question: Does anyone know how or has anyone used Guava in a project? Is there a recommended way to use it in a job?

On Feb 25, 2015, at 3:50 PM, Pat Ferrel <[EMAIL PROTECTED]> wrote:

I pass in my own dependencies jar with the class in it when creating the context. I’ve verified that the jar is in the list and checked in the jar to find guava. This should work, right so I must have made a mistake in mu checking.

On Feb 25, 2015, at 3:40 PM, Ted Yu <[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>> wrote:

Could this be caused by Spark using shaded Guava jar ?

Cheers

On Wed, Feb 25, 2015 at 3:26 PM, Pat Ferrel <[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>> wrote:
Getting an error that confuses me. Running a largish app on a standalone cluster on my laptop. The app uses a guava HashBiMap as a broadcast value. With Spark 1.1.0 I simply registered the class and its serializer with kryo like this:

   kryo.register(classOf[com.google.common.collect.HashBiMap[String, Int]], new JavaSerializer())

And all was well. I’ve also tried addSerializer instead of register. Now I get a class not found during deserialization. I checked the jar list used to create the context and found the jar that contains HashBiMap but get this error. Any ideas:

15/02/25 14:46:33 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 4.0 (TID 8, 192.168.0.2): java.io.IOException: com.esotericsoftware.kryo.KryoException: Error during Java deserialization.
        at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1093)
        at org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:164)
        at org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:64)
        at org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:64)
        at org.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast.scala:87)
        at org.apache.spark.broadcast.Broadcast.value(Broadcast.scala:70)
        at org.apache.mahout.drivers.TDIndexedDatasetReader$$anonfun$5.apply(TextDelimitedReaderWriter.scala:95)
        at org.apache.mahout.drivers.TDIndexedDatasetReader$$anonfun$5.apply(TextDelimitedReaderWriter.scala:94)
        at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
        at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
        at org.apache.spark.util.collection.ExternalSorter.spillToPartitionFiles(ExternalSorter.scala:366)
        at org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:211)
        at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:63)
        at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
        at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
        at org.apache.spark.scheduler.Task.run(Task.scala:56)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:200)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
Caused by: com.esotericsoftware.kryo.KryoException: Error during Java deserialization.
        at com.esotericsoftware.kryo.serializers.JavaSerializer.read(JavaSerializer.java:42)
        at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:732)
        at org.apache.spark.serializer.KryoDeserializationStream.readObject(KryoSerializer.scala:144)
        at org.apache.spark.broadcast.TorrentBroadcast$.unBlockifyObject(TorrentBroadcast.scala:216)
        at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1.apply(TorrentBroadcast.scala:177)
        at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1090)
        ... 19 more

============== root error ================
Caused by: java.lang.ClassNotFoundException: com.google.common.collect.HashBiMap
        at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
        at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
        at java.lang.Class.forName0(Native Method)
        at java.lang.Class.forName(Class.java:274)
        at java.io.ObjectInputStream.resolveClass(ObjectInputStream.java:625)
        at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1612)
        at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517)
        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
        at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
        at com.esotericsoftware.kryo.serializers.JavaSerializer.read(JavaSerializer.java:40)
        ... 24 more