26% improvement is underwhelming if it requires massive refactoring of the codebase. Also you can't just add the benefits up this way, because:
- Both vectorization and codegen reduces the overhead in virtual function calls
- Vectorization code is more friendly to compilers / CPUs, but requires materializing a lot of data in memory (or cache)
- Codegen reduces the amount of data that flows through memory, but for complex queries the generated code might not be very compiler / CPU friendly
I see massive benefits in leveraging GPUs (and other accelerators) for numeric workloads (e.g. machine learning), so I think it makes a lot of sense to be able to get data out of Spark quickly into UDFs for such workloads.
I don't see as much benefits for general data processing, for a few reasons:
1. GPU machines are much more expensive & difficult to get (e.g. in the cloud they are 3 to 5x more expensive, with limited availability based on my experience), so it is difficult to build a farm
2. Bandwidth from system to GPUs is usually small, so if you could fit the working set in GPU memory and repeatedly work on it (e.g. machine learning), it's great, but otherwise it's not great.
3. It's a massive effort.
In general it's a cost-benefit trade-off. I'm not aware of any general framework that allows us to write code once and have it work against both GPUs and CPUs reliably. If such framework exists, it will change the equation.
On Tue, Mar 26, 2019 at 6:57 AM, Bobby Evans < [EMAIL PROTECTED] > wrote: