Usually this isn't done as the data is meant to be on a shared/distributed
storage, eg HDFS, S3, etc.
Spark should then read this data into a dataframe and your code logic
applies to the dataframe in a distributed manner.
On Wed, 29 Jan 2020 at 09:37, Tharindu Mathew <[EMAIL PROTECTED]>