clear query| facets| time Search criteria: author:"Davies Liu".   Results from 1 to 10 from 474 (0.0s).
Loading phrases to help you
refine your search...
[SPARK-18188] Add checksum for block of broadcast - Spark - [issue]
...There is an understanding issue for a long time: https://issues.apache.org/jira/browse/SPARK-4105, without any checksum for the blocks, it's very hard for us to identify where is the bug cam...
http://issues.apache.org/jira/browse/SPARK-18188    Author: Davies Liu , 2018-07-20, 07:55
[SPARK-16011] SQL metrics include duplicated attempts - Spark - [issue]
...When I ran a simple scan and aggregate query, the number of rows in scan could be different from run to run, but actually scanned result is correct, the SQL metrics is wrong (should not incl...
http://issues.apache.org/jira/browse/SPARK-16011    Author: Davies Liu , 2018-09-11, 14:31
[SPARK-3554] handle large dataset in closure of PySpark - Spark - [issue]
...Sometimes there are large dataset used in closure and user forget to use broadcast for it, then the serialized command will become huge.py4j can not handle large objects efficiently, we shou...
http://issues.apache.org/jira/browse/SPARK-3554    Author: Davies Liu , 2014-09-19, 01:12
[SPARK-3592] applySchema to an RDD of Row - Spark - [issue]
...Right now, we can not appy schema to a RDD of Row, this should be a Bug,>>> srdd = sqlCtx.jsonRDD(sc.parallelize(["""{"a":2}"""]))>>> sqlCtx.applySchema(srdd.map(lambda x:x...
http://issues.apache.org/jira/browse/SPARK-3592    Author: Davies Liu , 2014-09-19, 22:33
[SPARK-3594] try more rows during inferSchema - Spark - [issue]
...If there are some empty values in the first row of RDD of Row, the inferSchema will failed.It's better to try with more rows, combine them together....
http://issues.apache.org/jira/browse/SPARK-3594    Author: Davies Liu , 2014-11-03, 21:18
[SPARK-3679] pickle the exact globals of functions - Spark - [issue]
...function.func_code.co_names has all the names used in the function, including name of attributes. It will pickle some unnecessary globals if there is a global having the same name with attri...
http://issues.apache.org/jira/browse/SPARK-3679    Author: Davies Liu , 2014-09-24, 20:00
[SPARK-3681] Failed to serialized ArrayType or MapType  after accessing them in Python - Spark - [issue]
...files_schema_rdd.map(lambda x: x.files).take(1)Also it will lose the schema after iterate an ArrayType.files_schema_rdd.map(lambda x: [f.batch for f in x.files]).take(1)...
http://issues.apache.org/jira/browse/SPARK-3681    Author: Davies Liu , 2014-09-27, 19:21
[SPARK-3463] Show metrics about spilling in Python - Spark - [issue]
...It should also show the number of bytes spilled into disks while doing aggregation in Python....
http://issues.apache.org/jira/browse/SPARK-3463    Author: Davies Liu , 2014-09-14, 05:31
[SPARK-3465] Task metrics are not aggregated correctly in local mode - Spark - [issue]
...In local mode, after onExecutorMetricsUpdate(), t.taskMetrics will be the same object with that in TaskContext (because there is no serialization for MetricsUpdate in local mode), then all t...
http://issues.apache.org/jira/browse/SPARK-3465    Author: Davies Liu , 2014-09-12, 21:30
[SPARK-3478] Profile Python tasks stage by stage in worker - Spark - [issue]
...The Python code in driver is easy to profile by users, but the code run in worker is distributed in clusters, is not easy to profile by users.So we need a way to do the profiling in worker a...
http://issues.apache.org/jira/browse/SPARK-3478    Author: Davies Liu , 2014-09-27, 04:35