After some digging into the code it looks like this bug also affects bulk
load when using LoadIncrementalHFiles (bulk loading programmatically).
We  fixed the code in Compression.class (in Algorithm):

GZ("gz") {
            private transient GzipCodec codec;

            DefaultCodec getCodec(Configuration conf) {
                if (codec == null) {
                    synchronized (this) {
                        if (codec == null) {
                            codec = new ReusableStreamGzipCodec(new
                return codec;

That way there is always configuration.

In addition, since we pre-create regions before bulk loading, we wanted the
MR job to relate only to these regions so by inheriting HFileOutputFormat
you can set only the split points that are relevant to this job and save a
lot of reduce time (especially if you have hundreds or thousands of
This works for us since each bulk load we do is relevant for a specific
timestamp. Hope it helps anyone...


On Wed, Nov 7, 2012 at 9:44 AM, Amit Sela <[EMAIL PROTECTED]> wrote: