Thanks for writing this up.
Usually when people talk about push-based shuffle, they are motivating it primarily to reduce the latency of short queries, by pipelining the map phase, shuffle phase, and the reduce phase (which this design isn't going to address). It's interesting you are targeting throughput by optimizing for random reads instead.
My main questions are ...
1. This is designing for HDDs. But SSD prices have gone lower than HDDs this year, so most new data center storage will be using SSDs from now on. Are we introducing a lot of complexity to address a problem that only exists with legacy that will be phased out soon?
2. Is there a simpler way to address this? E.g. you can simply merge map outputs for each node locally, without involving any type of push. It seems to me you'd address the same issues you have, with the same limitations (of memory buffer limiting the number of concurrent streams you can write to).
On Tue, Jan 21, 2020 at 6:13 PM, mshen < [EMAIL PROTECTED] > wrote: