Subject: Tasks run only on one machine

Argh, I looked and there really isn’t that much data yet. There will be thousands but starting small.

I bet this is just a total data size not requiring all workers thing—sorry, nevermind.
On Apr 23, 2015, at 10:30 AM, Pat Ferrel <[EMAIL PROTECTED]> wrote:

They are in HDFS so available on all workers

On Apr 23, 2015, at 10:29 AM, Pat Ferrel <[EMAIL PROTECTED]> wrote:

Physically? Not sure, they were written using the nano-batch rdds in a streaming job that is in a separate driver. The job is a Kafka consumer.

Would that effect all derived rdds? If so is there something I can do to mix it up or does Spark know best about execution speed here?
On Apr 23, 2015, at 10:23 AM, Sean Owen <[EMAIL PROTECTED]> wrote:

Where are the file splits? meaning is it possible they were also
(only) available on one node and that was also your driver?

On Thu, Apr 23, 2015 at 1:21 PM, Pat Ferrel <[EMAIL PROTECTED]> wrote: