Subject: Processing .wav files in PySpark

I think you can not use textFile() or binaryFile() or pickleFile()
here, it's different format than wav.

You could get a list of paths for all the files, then
sc.parallelize(), and foreach():

def process(path):
    # use subprocess to launch a process to do the job, read the
stdout as result

files = []  # a list of path of wav files
sc.parallelize(files, len(files)).foreach(process)

On Fri, Jan 16, 2015 at 2:11 PM, Venkat, Ankam