clear query| facets| time Search criteria: .   Results from 1 to 10 from 53 (0.0s).
Loading phrases to help you
refine your search...
[expand - 1 more] - Best way to read batch from Kafka and Offsets - Spark - [mail # user]
...The most common delivery semantic for Kafka producer is at least once.So your consumers have to handle dedupe.Spark can do checkpoint but you have to be explicit about it. It only makessense...
   Author: Chris Teoh , 2020-02-03, 21:51
[expand - 1 more] - Submitting job with external dependencies to pyspark - Spark - [mail # user]
...Usually this isn't done as the data is meant to be on a shared/distributedstorage, eg HDFS, S3, etc.Spark should then read this data into a dataframe and your code logicapplies to the datafr...
   Author: Chris Teoh , 2020-01-29, 07:27
RESTful Operations - Spark - [mail # user]
...Maybe something like Livy, otherwise roll your own REST API and have itstart a Spark job.On Mon, 20 Jan 2020 at 06:55,  wrote:> I am new to Spark. The task I want to accomplish is le...
   Author: Chris Teoh , 2020-01-20, 00:26
[expand - 1 more] - Does explode lead to more usage of memory - Spark - [mail # user]
...Depends on the use case, if you have to join, you're saving a join and ashuffle from having it already in an array.If you explode, at least sort within partitions to get you predicatepushdow...
   Author: Chris Teoh , 2020-01-19, 11:49
[expand - 2 more] - Spark Executor OOMs when writing Parquet - Spark - [mail # user]
...Yes. Disk spill can be a huge performance hit, with smaller partitions youmay avoid this and possibly complete your job faster. I hope you don't getOOM.On Sat, 18 Jan 2020 at 10:06, Arwin Ti...
   Author: Chris Teoh , 2020-01-18, 02:15
Out of memory HDFS Read and Write - Spark - [mail # user]
...Does it work for just a single path input and single output?Is the destPath a collection that is sitting on the driver?On Sun, 22 Dec 2019, 7:59 pm Ruijing Li,  wrote:> I was experim...
   Author: Chris Teoh , 2019-12-22, 09:55
[expand - 2 more] - Out of memory HDFS Multiple Cluster Write - Spark - [mail # user]
...I'm not entirely sure what the behaviour is when writing to remote cluster.It could be that the connections are being established for every element inyour dataframe, perhaps having to use fo...
   Author: Chris Teoh , 2019-12-22, 03:58
[expand - 2 more] - Identify bottleneck - Spark - [mail # user]
...As far as I'm aware it isn't any better. The logic all gets processed bythe same engine so to confirm, compare the DAGs generated from bothapproaches and see if they're identical.On Fri, 20 ...
   Author: Chris Teoh , 2019-12-19, 22:33
[expand - 1 more] - Request more yarn vcores than executors - Spark - [mail # user]
...If that is the case, perhaps set vcore to CPU core ratio as 1:1 and just do--executor-cores 1 and that would at least try to get you more threads perexecutor. Note that vcore is a logical co...
   Author: Chris Teoh , 2019-12-08, 09:51
[expand - 2 more] - OOM Error - Spark - [mail # user]
...It says you have 3811 tasks in earlier stages and you're going down to 2001partitions, that would make it more memory intensive. I'm guessing thedefault spark shuffle partition was 200 so th...
   Author: Chris Teoh , 2019-09-07, 10:35