HBase Digest, July 2010

Big news first: HBase 0.20.6 is out and available for download. It fixes only 8 issues (including 2 blockers), but some of them might be significant in particular cases (like scan recovery in case of region server failure). You can find the release notes here. Message from the HBase dev team: “we recommend that all users, particularly those running 0.20.4, upgrade”.

The very sweet piece of functionality is under active development right now (and looks like it’s nearly complete).  This new functionality makes it possible to take HBase table snapshots: HBASE-50. This might be extremely useful in production. Design plan and implementation are looking so good that “committers should read it as they might learn something”.

Community news & trends:

  • Summary notes of HBase meetup (#11) at Facebook give a comprehensive overview of development activities and what’s in coming releases. Slides are available here.
  • Welcome the official HBase Blog.
  • It is strongly recommended to use HDFS patched with HDFS-630 with HBase. To save time, you can use Cloudera’s distribution: HDFS-630 is included in 0.20.1+169.89 (the latest CDH2, i.e. CDH2u1) a well as both betas of CDH3.
  • From the operations standpoint, is setup of a HBase cluster and their maintenance a fairly complex task? Can a single person manage it? People are sharing their experiences in this thread.
  • It makes sense to disable WAL (e.g. by Put.setWriteToWAL(false)) during one-time large import into HBase (of course this makes things unreliable, but it can be OK when doing import once given the resulting speedup).

Notable efforts:

  • HBase RowLog Library: a component to build WALs and queues backed by HBase.
  • Lily is out: meet Proof of Architecture release. Lily is the cloud-scalable NoSQL-based content store and search repository, built on top of Apache HBase and SOLR.

FAQ:

  • Why are recently added/modified records not present in result of scan and get operations? How can one make them available?
    Check autoFlush option: autoFlush=false causes the client to accumulate puts without sending them to the server. Gets and scans only talk to the server and thus ignore the client write cache. You can either set autoFlush to true or perform HTable.flushCommits() before reading data.

Are you on Twitter?  You can follow @sematext on Twitter.

Leave a Reply