I am looking at the GC logs for some big queries (now that I know how to enabled them, thanks Paul!) and found the item below, it worries, "it says, failure, it too 8 seconds) Should I be worried about that? I know my HEAP is set fairly high (24GB) how should I interpret this?
John 1924.138: [GC concurrent-root-region-scan-end, 0.0019536 secs]
Also: Doing a CTAS using the new reader and dictionary encoding is producing this, everything is hung at this point. The query in sqlline is not returning, the web UI is running extremely slowly, and when it does return, shows the running query, however, when I click on it, the profile shows an error saying profile not found. The Full GCs are happening quite a bit, and take a long time (>10 seconds) And (this is my tailed gcclog, it's actually writing part of the the "allocation error" message and then waits a before anything else happens. This is "the scary" state my cluster can get into, and I am trying to avoid this :) Any tips on what may be happening here would be appreciated.
My understanding (which is incomplete) is that both the "new reader" and "dictionary encoding" are not stable yet and can cause failures or worse, incorrect data. That's why they are disabled by default.
The "Allocation Failure" means that the JVM had to run a Full GC because it couldn't allocate more heap for Drill. Looks like Drill is using more that 24GB of heap, which is most likely a bug.
What happens if you run the select part of the CTAS, does it also use too much heap ? On Tue, May 31, 2016 at 8:54 AM, John Omernik <[EMAIL PROTECTED]> wrote:
It's just a flat select (via a view) . basically select field1, field2,.... field100 from view_mytable where dir0 = '2016-05-01' there is no aggregation or anything happening.
As to the dictionary encoding and the new reader some thoughts:
1. Based on what I've read, the new reader is faster for flat data, in my case, it's the only thing that is allowing me to read the data created in a CDH cluster with a map reduce job. The "old" reader gives me the array index out of bounds (see other thread). So in order to clean up my data, I'd like to use the new reader here, however, now you have me worried about incorrect data.
2. The files are already dictionary encoded, when I do the CTAS without the encoding, the result is the files are quite a bit bigger than the original files. Not a huge issue, but substantial (10-20 GB per day). Thats why I tried to combine the two.
3. I am now worried about both the encoding/reader for incorrect data... Are there any JIRA's etc with status on this and warnings on their use?
John On Tue, May 31, 2016 at 11:02 AM, Abdel Hakim Deneche <[EMAIL PROTECTED]