Subject: Large Scale Language Models

Yes.  But not so badly.  As long as the highest probability item is a small
fraction of what any reducer must do, the compute load imbalance will be
very, very small.

