clear query| facets| time Search criteria: .   Results from 1 to 10 from 558 (0.0s).
Loading phrases to help you
refine your search...
[SPARK-25150] Joining DataFrames derived from the same source yields confusing/incorrect results - Spark - [issue]
...I have two DataFrames, A and B. From B, I have derived two additional DataFrames, B1 and B2. When joining A to B1 and B2, I'm getting a very confusing error:Join condition is missing or triv...
http://issues.apache.org/jira/browse/SPARK-25150    Author: Nicholas Chammas , 2019-08-19, 19:39
[SPARK-18084] write.partitionBy() does not recognize nested columns that select() can access - Spark - [issue]
...Here's a simple repro in the PySpark shell:from pyspark.sql import Rowrdd = spark.sparkContext.parallelize([Row(a=Row(b=5))])df = spark.createDataFrame(rdd)df.printSchema()df.select('a.b').s...
http://issues.apache.org/jira/browse/SPARK-18084    Author: Nicholas Chammas , 2019-08-19, 19:06
Recognizing non-code contributions - Spark - [mail # dev]
...On Mon, Aug 5, 2019 at 9:55 AM Sean Owen  wrote:> On Mon, Aug 5, 2019 at 3:50 AM Myrle Krantz  wrote:> > So... events coordinators?  I'd still make them committers. &...
   Author: Nicholas Chammas , 2019-08-05, 18:19
Python API for mapGroupsWithState - Spark - [mail # dev]
...Can someone succinctly describe the challenge in adding the`mapGroupsWithState()` API to PySpark?I was hoping for some suboptimal but nonetheless working solution to beavailable in Python, a...
   Author: Nicholas Chammas , 2019-08-02, 22:57
[SPARK-17025] Cannot persist PySpark ML Pipeline model that includes custom Transformer - Spark - [issue]
...Following the example in this Databricks blog post under "Python tuning", I'm trying to save an ML Pipeline model.This pipeline, however, includes a custom transformer. When I try to save th...
http://issues.apache.org/jira/browse/SPARK-17025    Author: Nicholas Chammas , 2019-06-04, 23:12
[SPARK-16824] Add API docs for VectorUDT - Spark - [issue]
...Following on the discussion here, it appears that VectorUDT is missing documentation, at least in PySpark. I'm not sure if this is intentional or not....
http://issues.apache.org/jira/browse/SPARK-16824    Author: Nicholas Chammas , 2019-05-22, 18:25
[SPARK-18277] na.fill() and friends should work on struct fields - Spark - [issue]
...It appears that you cannot use fill() and friends to quickly modify struct fields.For example:>>> df = spark.createDataFrame([Row(a=Row(b='yeah yeah'), c='alright'), Row(a=Row(b=Non...
http://issues.apache.org/jira/browse/SPARK-18277    Author: Nicholas Chammas , 2019-05-21, 07:05
[SPARK-4868] Twitter DStream.map() throws "Task not serializable" - Spark - [issue]
...(Continuing the discussion started here on the Spark user list.)The following Spark Streaming code throws a serialization exception I do not understand.import twitter4j.auth.{Authorization, ...
http://issues.apache.org/jira/browse/SPARK-4868    Author: Nicholas Chammas , 2019-05-21, 05:37
[SPARK-5685] Show warning when users open text files compressed with non-splittable algorithms like gzip - Spark - [issue]
...This is a usability or user-friendliness issue.It's extremely common for people to load a text file compressed with gzip, process it, and then wonder why only 1 core in their cluster is doin...
http://issues.apache.org/jira/browse/SPARK-5685    Author: Nicholas Chammas , 2019-05-21, 05:36
[SPARK-16921] RDD/DataFrame persist() and cache() should return Python context managers - Spark - [issue]
...Context managers are a natural way to capture closely related setup and teardown code in Python.For example, they are commonly used when doing file I/O:with open('/path/to/file') as f: ...
http://issues.apache.org/jira/browse/SPARK-16921    Author: Nicholas Chammas , 2019-05-21, 04:33