clear query| facets| time Search criteria: .   Results from 1 to 10 from 39 (0.0s).
Loading phrases to help you
refine your search...
preferredlocations for hadoopfsrelations based baseRelations - Spark - [mail # dev]
...AFAICT, `FileScanRDD` invokes`FilePartition::preferredLocations()`method, which is ordered by the data size, to get the partitionpreferred locations. If there are other vectors to sort, I'm ...
   Author: ZHANG Wei , 2020-06-04, 08:31
[expand - 4 more] - Using Spark Accumulators with Structured Streaming - Spark - [mail # user]
...The following Java codes can work in my cluster environment:```    .mapGroupsWithState((MapGroupsWithStateFunction) (key, values, state) -> {          &...
   Author: ZHANG Wei , 2020-06-04, 06:56
Unit testing Spark/Scala code with Mockito - Spark - [mail # user]
...AFAICT, depends on testing goals, Unit Test, Integration Test or E2ETest.For Unit Test, mostly, it tests individual class or class methods.Mockito can help mock and verify dependent instance...
   Author: ZHANG Wei , 2020-05-21, 02:12
CSV data source : Garbled Japanese text and handling multilines - Spark - [mail # user]
...May I get the CSV file's encoding, which can be checked by `file` command?-- Cheers,-zOn Tue, 19 May 2020 09:24:24 +0900Ashika Umagiliya  wrote:> In my Spark job (spark 2.4.1) , I am...
   Author: ZHANG Wei , 2020-05-20, 14:47
Applying schema dynamically in dataframe - Spark - [mail # dev]
...May I get a sample scenario to understand the requirement?-- Cheers,-zOn Sat, 16 May 2020 11:45:03 +0530rahul c  wrote:> Hi dev,> > Currently I have a scenario where I am readi...
   Author: ZHANG Wei , 2020-05-18, 06:03
[expand - 1 more] - [PySpark] Tagging descriptions - Spark - [mail # user]
...AFAICT, from the data size (25B rows, key cell 300 chars string), lookslike a common Spark job. But the regex might be complex, I guess thereare lots of items to match as (apple|banana|cola|...
   Author: ZHANG Wei , 2020-05-13, 09:49
Re:Re: Screen Shot 2020-05-11 at 5.28.03 AM - Spark - [mail # dev]
...Sometimes, the Thread dump result table of Spark UI can provide some clues to find out thread locks issue, such as:  Thread ID | Thread Name             &n...
   Author: ZHANG Wei , 2020-05-11, 07:42
Executor exceptions stacktrace omitted by HotSpot in long running application - Spark - [mail # dev]
...Hi,I'm considering to improve the experience of hitting potentialexceptions stacktrace omitted in long running application[1], which isa JVM HotSpot optimization as Shixiong(Ryan) commented[...
   Author: ZHANG Wei , 2020-05-08, 06:16
PyArrow Exception in Pandas UDF GROUPEDAGG() - Spark - [mail # user]
...AFAICT, there might be data skews, some partitions got too much rows,which caused out of memory limitation. Trying .groupBy().count()or .aggregateByKey().count() may help check each partitio...
   Author: ZHANG Wei , 2020-05-07, 08:34
[expand - 2 more] - [DISCUSS] Java specific APIs design concern and choice - Spark - [mail # dev]
...I feel a little pushed... :-) I still don't get the point of why it'surgent to make the decision now. AFAIK, it's a common practice to handleScala types conversions by self when Java program...
   Author: ZHANG Wei , 2020-04-30, 09:59