Hi ShaoFeng,  thanks for quick response.  Kylin version 1.6.

The step is #3 and it takes the longest time in the Map phase.
sort/shuffle and reduce seem to be ok.  Yes we went through that document.
 The input mappers are set to about 1.1 million giving us 225 mappers for
input of 234 million records.  All mappers run at the same time since that
is the number of mapper slots we have.  The mappers all seem to take the
same amount of time (we didn't notice any long runners in the end).

the m/r stats output for that step is below.  Troubling is the 4.6 billion
output records from map phase.  So is there a general place we can look for
"Extract Fact Table Distinct Columns"  step.    Thanks
Map-Reduce Framework
Map input records=234707850
Map output records=4687531086
Map output bytes=49568802916
Map output materialized bytes=9852827353
Input split bytes=965025
Combine input records=4687531086
Combine output records=33878243
Reduce input groups=281301
Reduce shuffle bytes=9852827353
Reduce input records=33878243
Reduce output records=0
Spilled Records=67756486
Shuffled Maps =5850
Failed Shuffles=0
Merged Map outputs=5850
GC time elapsed (ms)=49602314
CPU time spent (ms)=759218400
Physical memory (bytes) snapshot=418766036992
Virtual memory (bytes) snapshot=898566012928
Total committed heap usage (bytes)=391907901440
On Tue, Dec 19, 2017 at 10:13 PM, ShaoFeng Shi <[EMAIL PROTECTED]>
wrote: