I finally had time to look at the updated SPIP 10 mins ago. I support the high level idea and +1 on the SPIP.
That said, I think the proposed API is too complicated and invasive change to the existing internals. A much simpler API would be to expose a columnar batch iterator interface, i.e. an uber column oriented UDF with ability to manage life cycle. Once we have that, we can also refactor the existing Python UDFs to use that interface.
As I said earlier (couple months ago when this was first surfaced?), I support the idea to enable *external* column oriented processing logic, but not changing Spark itself to have two processing mode, which is simply very complicated and would create very high maintenance burden for the project.
On Wed, May 29, 2019 at 9:49 PM, Thomas graves < [EMAIL PROTECTED] > wrote: