clear query| facets| time Search criteria: .   Results from 1 to 10 from 10 (0.0s).
Loading phrases to help you
refine your search...
[SPARK-17025] Cannot persist PySpark ML Pipeline model that includes custom Transformer - Spark - [issue]
...Following the example in this Databricks blog post under "Python tuning", I'm trying to save an ML Pipeline model.This pipeline, however, includes a custom transformer. When I try to save th...    Author: Nicholas Chammas , 2019-06-04, 23:12
[SPARK-16824] Add API docs for VectorUDT - Spark - [issue]
...Following on the discussion here, it appears that VectorUDT is missing documentation, at least in PySpark. I'm not sure if this is intentional or not....    Author: Nicholas Chammas , 2019-05-22, 18:25
[SPARK-18277] na.fill() and friends should work on struct fields - Spark - [issue]
...It appears that you cannot use fill() and friends to quickly modify struct fields.For example:>>> df = spark.createDataFrame([Row(a=Row(b='yeah yeah'), c='alright'), Row(a=Row(b=Non...    Author: Nicholas Chammas , 2019-05-21, 07:05
[SPARK-4868] Twitter throws "Task not serializable" - Spark - [issue]
...(Continuing the discussion started here on the Spark user list.)The following Spark Streaming code throws a serialization exception I do not understand.import twitter4j.auth.{Authorization, ...    Author: Nicholas Chammas , 2019-05-21, 05:37
[SPARK-5685] Show warning when users open text files compressed with non-splittable algorithms like gzip - Spark - [issue]
...This is a usability or user-friendliness issue.It's extremely common for people to load a text file compressed with gzip, process it, and then wonder why only 1 core in their cluster is doin...    Author: Nicholas Chammas , 2019-05-21, 05:36
[SPARK-16921] RDD/DataFrame persist() and cache() should return Python context managers - Spark - [issue]
...Context managers are a natural way to capture closely related setup and teardown code in Python.For example, they are commonly used when doing file I/O:with open('/path/to/file') as f: ...    Author: Nicholas Chammas , 2019-05-21, 04:33
[SPARK-15191] createDataFrame() should mark fields that are known not to be null as not nullable - Spark - [issue]
...Here's a brief reproduction:>>> numbers = sqlContext.createDataFrame(...     data=[(1,), (2,), (3,), (4,), (5,)],...     samplingRatio=1  # go through all t...    Author: Nicholas Chammas , 2019-05-21, 04:33
[SPARK-19216] LogisticRegressionModel is missing getThreshold() - Spark - [issue]
...Say I just loaded a logistic regression model from storage. How do I check that model's threshold in PySpark? From what I can see, the only way to do that is to dip into the Java object:mode...    Author: Nicholas Chammas , 2019-05-21, 04:17
[SPARK-19553] Add GroupedData.countApprox() - Spark - [issue]
...We already have a pyspark.sql.functions.approx_count_distinct() that can be applied to grouped data, but it seems odd that you can't just get regular approximate count for grouped data.I ima...    Author: Nicholas Chammas , 2019-05-21, 04:15
[SPARK-2141] Add sc.getPersistentRDDs() to PySpark - Spark - [issue]
...PySpark does not appear to have sc.getPersistentRDDs()....    Author: Nicholas Chammas , 2019-05-21, 04:11