clear query| facets| time Search criteria: .   Results from 1 to 10 from 946 (0.0s).
Loading phrases to help you
refine your search...
[SPARK-16824] Add API docs for VectorUDT - Spark - [issue]
...Following on the discussion here, it appears that VectorUDT is missing documentation, at least in PySpark. I'm not sure if this is intentional or not....    Author: Nicholas Chammas , 2019-05-22, 18:25
[SPARK-18277] na.fill() and friends should work on struct fields - Spark - [issue]
...It appears that you cannot use fill() and friends to quickly modify struct fields.For example:>>> df = spark.createDataFrame([Row(a=Row(b='yeah yeah'), c='alright'), Row(a=Row(b=Non...    Author: Nicholas Chammas , 2019-05-21, 07:05
[SPARK-4868] Twitter throws "Task not serializable" - Spark - [issue]
...(Continuing the discussion started here on the Spark user list.)The following Spark Streaming code throws a serialization exception I do not understand.import twitter4j.auth.{Authorization, ...    Author: Nicholas Chammas , 2019-05-21, 05:37
[SPARK-5685] Show warning when users open text files compressed with non-splittable algorithms like gzip - Spark - [issue]
...This is a usability or user-friendliness issue.It's extremely common for people to load a text file compressed with gzip, process it, and then wonder why only 1 core in their cluster is doin...    Author: Nicholas Chammas , 2019-05-21, 05:36
[SPARK-16921] RDD/DataFrame persist() and cache() should return Python context managers - Spark - [issue]
...Context managers are a natural way to capture closely related setup and teardown code in Python.For example, they are commonly used when doing file I/O:with open('/path/to/file') as f: ...    Author: Nicholas Chammas , 2019-05-21, 04:33
[SPARK-15191] createDataFrame() should mark fields that are known not to be null as not nullable - Spark - [issue]
...Here's a brief reproduction:>>> numbers = sqlContext.createDataFrame(...     data=[(1,), (2,), (3,), (4,), (5,)],...     samplingRatio=1  # go through all t...    Author: Nicholas Chammas , 2019-05-21, 04:33
[SPARK-19216] LogisticRegressionModel is missing getThreshold() - Spark - [issue]
...Say I just loaded a logistic regression model from storage. How do I check that model's threshold in PySpark? From what I can see, the only way to do that is to dip into the Java object:mode...    Author: Nicholas Chammas , 2019-05-21, 04:17
[SPARK-19553] Add GroupedData.countApprox() - Spark - [issue]
...We already have a pyspark.sql.functions.approx_count_distinct() that can be applied to grouped data, but it seems odd that you can't just get regular approximate count for grouped data.I ima...    Author: Nicholas Chammas , 2019-05-21, 04:15
[SPARK-2141] Add sc.getPersistentRDDs() to PySpark - Spark - [issue]
...PySpark does not appear to have sc.getPersistentRDDs()....    Author: Nicholas Chammas , 2019-05-21, 04:11
Suggestion on Join Approach with Spark - Spark - [mail # dev]
...This kind of question is for the User list, or for something like StackOverflow. It's not on topic here.The dev list (i.e. this list) is for discussions about the development ofSpark itself....
   Author: Nicholas Chammas , 2019-05-15, 18:04