Subject: Submitting job with external dependencies to pyspark

Usually this isn't done as the data is meant to be on a shared/distributed
storage, eg HDFS, S3, etc.

Spark should then read this data into a dataframe and your code logic
applies to the dataframe in a distributed manner.

On Wed, 29 Jan 2020 at 09:37, Tharindu Mathew <[EMAIL PROTECTED]>