We are using Hbase 0.98.6 and Hadoop 2.5.0 - cdh 5.3.5.
We have a couple of questions regarding splits and merge:
1) Since both split and merge are async invocations on the client side, we have a naîve workaround to make it synchronous that is to look at the RS Info Server via an HttpClient checking for pending tasks to be finished. Once all tasks are finished in all RS we issue the next split/merge for the same table. Is there a better way to do it?
2) Regarding merges, we've noticed that some times merge just does not work. We see on the master log that merge request is issued and it forwards it to the corresponding RS but then on the RS nothing happens. We always merge two regions on the same region server. Are there minimum requirements for two regions to be merged? If those requirements are not met, does the merge process exit silently?
Hi Ivan, For #1, let's consider regions A, B, C and D all in server S.
If you merge A and B togher and C and D togher, still in S, you will have A' and C' regions only on S.
Now, if you try to merge again A' and C' into a new bigger region, this will fail silently until A' and C' are major compacted.
Indeed, before major compaction, A' and C' will both still contain references to A+B and C+D files/regions. So to be able to merge A' and C' you need this to be cleaned and so you need to major compact them.
I can confirm that it was what Jean Marc was saying. Thanks JM!
Regarding question #1, any help? I am rephrasing it in case was not clear.
1) Since both split and merge are async invocations on the client side, we have a naîve workaround to make it synchronous that is to look at the RS Info Server via an HttpClient checking for pending tasks to be finished. Once all tasks are finished in all RS we issue the next split/merge for the same table. Is there a better way to check if major compact has finished?
Iván 2015-07-15 17:49 GMT+02:00 Ivan Brondino <[EMAIL PROTECTED]>:
Is it possible that you poll for number of regions (e.g. in a loop) after invoking split or merge to confirm that the action has been performed? I know it is a crude way but maybe something can be done in these lines. Or are you already doing this when you said 'look for Region Info'?
On Thu, Jul 16, 2015 at 9:46 AM, Ivan Brondino <[EMAIL PROTECTED]> wrote:
We issue the splits on the table and right away the major compact of the table. Then we connect to the HRegionServer info server at http://host:60030 and we check html output for the task to be completed using an HttpClient. So we actually wait for major compact and not for split. We've been using it since 0.89 but we believe there must be a bettter approach than that.