It's just a flat select (via a view) . basically select field1, field2,....
field100 from view_mytable where dir0 = '2016-05-01' there is no
aggregation or anything happening.

As to the dictionary encoding and the new reader some thoughts:

1. Based on what I've read, the new reader is faster for flat data, in my
case, it's the only thing that is allowing me to read the data created in a
CDH cluster with a map reduce job. The "old" reader gives me the array
index out of bounds (see other thread).  So in order to clean up my data,
I'd like to use the new reader here, however, now you have me worried about
incorrect data.

2. The files are already dictionary encoded, when I do the CTAS without the
encoding, the result is the files are quite a bit bigger than the original
files. Not a huge issue, but substantial (10-20 GB per day).   Thats why I
tried to combine the two.

3.  I am now worried about both the encoding/reader for incorrect data...
Are there any JIRA's etc with status on this and warnings on their use?


On Tue, May 31, 2016 at 11:02 AM, Abdel Hakim Deneche <[EMAIL PROTECTED]