clear query| facets| time Search criteria: .   Results from 1 to 10 from 37 (0.0s).
Loading phrases to help you
refine your search...
[expand - 1 more] - [pyspark 2.3+] read/write huge data with smaller block size (128MB per block) - Spark - [mail # user]
...Thanks Sean! To combat the skew I do have another column I partitionby andthat has worked well (like below). However in the image I attached in myoriginal email - it looks like 2 tasks proce...
   Author: Rishi Shah , 2020-06-19, 14:12
[pyspark 2.3+] Add scala library to pyspark app and use to derive columns - Spark - [mail # user]
...Hi All,I have a use case where I need to utilize java/scala for regex mapping (aslookbehinds are not well supported with python).. However our entire codeis python based so was wondering if ...
   Author: Rishi Shah , 2020-06-06, 17:06
[expand - 3 more] - [PySpark] Tagging descriptions - Spark - [mail # user]
...Thanks everyone. While working on Tagging I stumbled upon another setback..There are about 5000 regex I am dealing with, out of with couple ofhundreds have variable length lookbehind (origin...
   Author: Rishi Shah , 2020-06-04, 20:14
[expand - 1 more] - [PySpark 2.3+] Reading parquet entire path vs a set of file paths - Spark - [mail # user]
...Hi All,Just following up on below to see if anyone has any suggestions. Appreciateyour help in advance.Thanks,RishiOn Mon, Jun 1, 2020 at 9:33 AM Rishi Shah  wrote:> Hi All,>> ...
   Author: Rishi Shah , 2020-06-03, 18:15
[pyspark 2.3+] Dedupe records - Spark - [mail # user]
...Hi All,I have around 100B records where I get new , update & delete records.Update/delete records are not that frequent. I would like to get someadvice on below:1) should I use rdd + reducib...
   Author: Rishi Shah , 2020-05-30, 02:47
[expand - 1 more] - [spark streaming] checkpoint location feature for batch processing - Spark - [mail # user]
...Thanks Burak! Appreciate it. This makes sense.How do you suggest we make sure resulting data doesn't produce tiny files?If we are not on databricks yet and can not leverage delta lake featur...
   Author: Rishi Shah , 2020-05-02, 01:03
[expand - 1 more] - [pyspark 2.4+] BucketBy SortBy doesn't retain sort order - Spark - [mail # user]
...Hi All,Just checking in to see if anyone has any advice on this.Thanks,RishiOn Mon, Mar 2, 2020 at 9:21 PM Rishi Shah  wrote:> Hi All,>> I have 2 large tables (~1TB), I used th...
   Author: Rishi Shah , 2020-03-04, 02:22
[expand - 1 more] - High level explanation of dropDuplicates - Spark - [mail # user]
...Thanks everyone for your contribution on this topic, I wanted to check-into see if anyone has discovered a different or have an opinion on betterapproach to deduplicating data using pyspark....
   Author: Rishi Shah , 2020-01-11, 19:14
[expand - 2 more] - [pyspark2.4+] A lot of tasks failed, but job eventually completes - Spark - [mail # user]
...Thank you Hemant and Enrico. Much appreciated.your input really got me closer to the issue, I realized every task didn'tget enough memory and hence tasks with large partitions kept failing. ...
   Author: Rishi Shah , 2020-01-06, 13:36
[expand - 2 more] - [Pyspark 2.3+] Timeseries with Spark - Spark - [mail # user]
...Hi All,Checking in to see if anyone had input around time series libraries usingSpark. I in interested in financial forecasting model & regression mainlyat this  point. Input is a bunch...
   Author: Rishi Shah , 2019-12-29, 16:30