You don’t need to do this.

Its already done for you by the existing APIs.

A scan will allow you to do either a full table scan (no range limits provided) or a range scan where you provide the boundaries.

So if you’re using a client connection to HBase, its done for you.

If you’re writing a M/R job, you are already getting one mapper task assigned per region.  So your parallelism is already done for you.

Its possible that the Input Format is smart enough to pre-check the regions to see if they are within the boundaries or not and if not, no mapper task is generated.


Michael Segel
michael_segel (AT)