My understanding is (please feel free to correct if I am wrong.):

For your first question, I think more than efficiency TableOutputFormat
provides you with a convenience of giving you a output format out of the
box which can do the Puts for you with default recommended config settings
like flush, WAL etc. You can extend it if you want and customize it surley.

As for your second question, the job creates HFiles based on regionserver.
"At this stage, one HFile will be created per region in the output folder.
Keep in mind that the input data is almost completely re-written, so you
will need at least twice the amount of disk space available than the size
of the original data set. For example, for a 100GB mysqldump you should
have at least 200GB of available disk space in HDFS. You can delete the
dump file at the end of the process."

Source & ref:


On Thu, Jul 16, 2015 at 9:37 AM, Shushant Arora <[EMAIL PROTECTED]>