Thanks all for the replies.
I am switching to hdfs since it seems like an easier solution.
To answer some of your questions, my hdfs space is a part of my nodes I use
for computation on spark.
From what I understand, this helps because of the data locality advantage.
Which means that there is less network IO and data redistribution on the
Thanks for your help.
On Sat, 30 May, 2020, 10:48 am Jörn Franke, <[EMAIL PROTECTED]> wrote: