Subject: Tasks run only on one machine


They are in HDFS so available on all workers

On Apr 23, 2015, at 10:29 AM, Pat Ferrel <[EMAIL PROTECTED]> wrote:

Physically? Not sure, they were written using the nano-batch rdds in a streaming job that is in a separate driver. The job is a Kafka consumer.

Would that effect all derived rdds? If so is there something I can do to mix it up or does Spark know best about execution speed here?
On Apr 23, 2015, at 10:23 AM, Sean Owen <[EMAIL PROTECTED]> wrote:

Where are the file splits? meaning is it possible they were also
(only) available on one node and that was also your driver?

On Thu, Apr 23, 2015 at 1:21 PM, Pat Ferrel <[EMAIL PROTECTED]> wrote:

---------------------------------------------------------------------