clear query| facets| time Search criteria: .   Results from 1 to 10 from 54 (0.0s).
Loading phrases to help you
refine your search...
Apache Arrow support for Apache Spark - Spark - [mail # user]
...1. I'd also consider how you're structuring the data before applying thejoin, naively doing the join could be expensive so doing a bit of datapreparation may be necessary to improve join per...
   Author: Chris Teoh , 2020-02-17, 13:31
[expand - 1 more] - Best way to read batch from Kafka and Offsets - Spark - [mail # user]
...The most common delivery semantic for Kafka producer is at least once.So your consumers have to handle dedupe.Spark can do checkpoint but you have to be explicit about it. It only makessense...
   Author: Chris Teoh , 2020-02-03, 21:51
[expand - 1 more] - Submitting job with external dependencies to pyspark - Spark - [mail # user]
...Usually this isn't done as the data is meant to be on a shared/distributedstorage, eg HDFS, S3, etc.Spark should then read this data into a dataframe and your code logicapplies to the datafr...
   Author: Chris Teoh , 2020-01-29, 07:27
RESTful Operations - Spark - [mail # user]
...Maybe something like Livy, otherwise roll your own REST API and have itstart a Spark job.On Mon, 20 Jan 2020 at 06:55,  wrote:> I am new to Spark. The task I want to accomplish is le...
   Author: Chris Teoh , 2020-01-20, 00:26
[expand - 1 more] - Does explode lead to more usage of memory - Spark - [mail # user]
...Depends on the use case, if you have to join, you're saving a join and ashuffle from having it already in an array.If you explode, at least sort within partitions to get you predicatepushdow...
   Author: Chris Teoh , 2020-01-19, 11:49
[expand - 2 more] - Spark Executor OOMs when writing Parquet - Spark - [mail # user]
...Yes. Disk spill can be a huge performance hit, with smaller partitions youmay avoid this and possibly complete your job faster. I hope you don't getOOM.On Sat, 18 Jan 2020 at 10:06, Arwin Ti...
   Author: Chris Teoh , 2020-01-18, 02:15
Out of memory HDFS Read and Write - Spark - [mail # user]
...Does it work for just a single path input and single output?Is the destPath a collection that is sitting on the driver?On Sun, 22 Dec 2019, 7:59 pm Ruijing Li,  wrote:> I was experim...
   Author: Chris Teoh , 2019-12-22, 09:55
[expand - 2 more] - Out of memory HDFS Multiple Cluster Write - Spark - [mail # user]
...I'm not entirely sure what the behaviour is when writing to remote cluster.It could be that the connections are being established for every element inyour dataframe, perhaps having to use fo...
   Author: Chris Teoh , 2019-12-22, 03:58
[expand - 2 more] - Identify bottleneck - Spark - [mail # user]
...As far as I'm aware it isn't any better. The logic all gets processed bythe same engine so to confirm, compare the DAGs generated from bothapproaches and see if they're identical.On Fri, 20 ...
   Author: Chris Teoh , 2019-12-19, 22:33
[expand - 1 more] - Request more yarn vcores than executors - Spark - [mail # user]
...If that is the case, perhaps set vcore to CPU core ratio as 1:1 and just do--executor-cores 1 and that would at least try to get you more threads perexecutor. Note that vcore is a logical co...
   Author: Chris Teoh , 2019-12-08, 09:51