The only leak worse than a memory leak is a submarine leak. Late last week we found a memory leak in the client/agent piece of SPM, our performance monitoring service. We fixed it immediately and published a new version of the SPM client. If you are an SPM user and if you have an SPM agent version 1.6.0 that was built before July 27, 2012, you may want to get the latest SPM client. We acted quickly to avoid having too many users affected by this, but luckily it appears this leak happens only on systems using Java 7. More precisely, we first spotted this leak in a production system running a 64-bit version of Java 7 update 5, the very latest version of Java as of this writing. We were able to reproduce this issue with Java 7, but have not seen it with java 6.
To check your SPM agent version:
# cat /spm/VERSION
Build date: Thu Jun 28 17:12:45 EDT 2012
Install date: Thu Jun 28 21:35:10 UTC 201
To check your Java version:
# java -version
java version "1.6.0_31"
Java(TM) SE Runtime Environment (build 1.6.0_31-b04)
Java HotSpot(TM) Client VM (build 20.6-b01, mixed mode, sharing)
Please let us know if you have any issues or questions by emailing the secret spm-user mailing list or SPM support.
In the previous part of the two – parts series about ActionGenerator we showed how to develop and run your own ActionGenerator. Today, we want to show you what action generators are there for you to use out-of-the-box and we want to share some insights about the future of this project.
ActionGenerator for ElasticSearch
The ag-player-es project contain all the code specific to action generators for ElasticSearch. You can find two sinks implemented – one for sending simple queries to ElasticSearch and the other for indexing data. They both use ElasticSearch REST API, so no dependencies are needed for those to work. The other piece available in ag-player-es are three players configured and ready to use:
SimpleEsPlayerMain – ActionGenerator for running random queries to ElasticSearch.
DictionaryEsPlayerMain – ActionGenerator for running queries to ElasticSearch. Queries are generated using a provided dictionary.
DictionaryDataEsPlayerMain – ActionGenerator for indexing data to ElasticSearch. Fields content is generated using provided dictionary.
ActionGenerator for Solr
Similar to ag-player-es one can expect that ag-player-solr will contain all the code specific to action generators for Apache Solr. In this project you can find two sinks implementation – one for sending queries to ElasticSearch and one for indexing data. The first one uses Solr HTTP API and the other one uses XML to index data to Solr. Similar to ag-player-es, no dependencies are needed, so you should be able to use action generator for Solr with all recent Solr versions. Apart from that, there are three players configured and ready for use:
DictionaryDataSolrPlayerMain – ActionGenerator for indexing data to Apache Solr. Fields content is generated using provided dictionary.
DictionarySolrPlayerMain – ActionGenerator for running queries to Apache Solr. Queries are generated using provided dictionary.
RandomQueriesSolrPlayerMain – ActionGenerator for running random queries to Apache Solr.
Using ActionGenerator for ElasticSearch
Lets concentrate on players available for ElasticSearch in ActionGenerator. As we wrote above, you have three main players for ElasticSearch that can be used out of the box.
The simplest of the three generators available. It lets you generate random queries to a given index and with the use of the given field name. In order to use that player, you need to provide the following parameters:
ElasticSearch base URL
ElasticSearch index name
Name of the field queries should be run against
Number of events that should be generated
For example, you could run the following command and have 1000 queries sent to name field of the documents index on your local ElasticSearch instance:
java -cp ag-player-es-0.1.0-withdeps.jar
com.sematext.ag.es.SimpleEsPlayerMain http://localhost:9200/ documents text 1000
The second generator that enables you to run queries uses a dictionary to generate text of your queries. It is similar to the SimpleEsPlayerMain, except for dictionary usage. In order to use that player, you need to provide the same parameters as to the SimpleEsPlayerMain and one additional parameters:
For example, you could run the following command and have 1000 queries sent to name field of the documents index on your local ElasticSearch instance. Queries would be generated using dict.txt dictionary (each line containing different query string):
java -cp ag-player-es-0.1.0-withdeps.jar com.sematext.ag.es.DictionaryEsPlayerMain http://localhost:9200/ documents text 1000 dict.txt
The one and only player that enables you to index data to your ElasticSearch instance. Just as the player discussed above, DictionaryDataEsPlayerMain also works with the help of a dictionary. You need to provide the following parameters in order to use this player:
ElasticSearch base URL
ElasticSearch index name
ElasticSearch type name
Number of events that should be generated
One or more fields and their types
So, for example if you would like to index 100.000 documents to documents index under document type to your local ElasticSearch instance you could run the following
Right now, the following field types are available:
We plan to add more types in the future. Pull requests with patches are welcome!
Using ActionGenerator for Apache Solr
Players available for Apache Solr are similar to the ones available for ElasticSearch, but lets quickly look at them, too.
This player is similar to the SimpleEsPlayerMain – it also generates random queries, but to Apache Solr. In order to use it, you need to provide the following parameters:
Apache Solr core search handler URL
The field queries should be run against
Number of queries to be generated
For example, you could run the following command and have 1000 queries sent to name field of the documents core of your local Apache Solr instance:
java -cp ag-player-solr-0.1.0-withdeps.jar com.sematext.ag.solr.RandomQueriesSolrPlayerMain http://localhost:8983/solr/documents name 1000
DictonarySolrPlayerMain is similar to RandomQueriesSolrPlayerMain except it uses dictionary to generate queries. In order to use this player you need to provide one additional parameter compared to the ones you provided to DictionarySolrPlayerMain:
For example, you could run the following command and have 1000 queries sent to name field of the documents core of your local Apache Solr instance. Queries would be generated using dict.txt dictionary (each line containing different query string):
java -cp ag-player-solr-0.1.0-withdeps.jar com.sematext.ag.solr.DictionarySolrPlayerMain http://localhost:8983/solr/documents name 1000 dict.txt
The last player I’d like to tell you about is the one that enables data indexation to your Apache Solr instance. DictionaryDataSolrPlayerMain is similar to its ElasticSearch counterpart. You need to provide the following parameters in order to use this player:
Apache Solr core update handler URL
Number of events that should be generated
One or more fields and their types
For example, if you would like to index 100000 documents to documents core of your local Apache Solr instance you could run the following
Lets omit the field types description as it was provided during DictionaryDataEsPlayerMain description above.
Currently, the AbstractHttpSink class has the ability to gather metrics about how the system to which you are sending events is behaving. You can choose between two methods of metrics output – to the standard output or to a file. To enable metrics tracking you need to pass the -DenableMetrics=true parameter when running ActionGenerator. This parameter enables metrics gathering and outputs those to standard output. In order to change that behavior and output the metrics to a file you need to pass the -DmetricsType=file parameter. In addition, you need to specify which directory the metrics should be written to – you do that by passing -DmetricsDir=/path/to/output/dir/ parameter with the value of the directory. Please remember that the directory needs to be created before running your action generator.
We plan to release action generators for SenseiDB, as well as expand the number of sinks and players ready to be used out of the box. Of course, as always, patches are welcome and if you find any problems with ActionGenerator or if you identify missing features, please open an issue.
In this post we discuss what HBase users should know about one of the internal parts of HBase: the Memstore. Understanding underlying processes related to Memstore will help to configure HBase cluster towards better performance.
Let’s take a look at the write and read paths in HBase to understand what Memstore is, where how and why it is used.
When RegionServer (RS) receives write request, it directs the request to specific Region. Each Region stores set of rows. Rows data can be separated in multiple column families (CFs). Data of particular CF is stored in HStore which consists of Memstore and a set of HFiles. Memstore is kept in RS main memory, while HFiles are written to HDFS. When write request is processed, data is first written into the Memstore. Then, when certain thresholds are met (obviously, main memory is well-limited) Memstore data gets flushed into HFile.
The main reason for using Memstore is the need to store data on DFS ordered by row key. As HDFS is designed for sequential reads/writes, with no file modifications allowed, HBase cannot efficiently write data to disk as it is being received: the written data will not be sorted (when the input is not sorted) which means not optimized for future retrieval. To solve this problem HBase buffers last received data in memory (in Memstore), “sorts” it before flushing, and then writes to HDFS using fast sequential writes. Note that in reality HFile is not just a simple list of sorted rows, it is much more than that.
Apart from solving the “non-ordered” problem, Memstore also has other benefits, e.g.:
It acts as a in-memory cache which keeps recently added data. This is useful in numerous cases when last written data is accessed more frequently than older data
There are certain optimizations that can be done to rows/cells when they are stored in memory before writing to persistent store. E.g. when it is configured to store one version of a cell for certain CF and Memstore contains multiple updates for that cell, only most recent one can be kept and older ones can be omitted (and never written to HFile).
Important thing to note is that every Memstore flush creates one HFile per CF.
On the reading end things are simple: HBase first checks if requested data is in Memstore, then goes to HFiles and returns merged result to the user.
What to Care about
There are number of reasons HBase users and/or administrators should be aware of what Memstore is and how it is used:
There are number of configuration options for Memstore one can use to achieve better performance and avoid issues. HBase will not adjust settings for you based on usage pattern.
Frequent Memstore flushes can affect reading performance and can bring additional load to the system
The way Memstore flushes work may affect your schema design
Let’s take a closer look at these points.
Configuring Memstore Flushes
Basically, there are two groups of configuraion properties (leaving out region pre-close flushes):
First determines when flush should be triggered
Second determines when flush should be triggered and updates should be blocked during flushing
First group is about triggering “regular” flushes which happen in parallel with serving write requests. The properties for configuring flush thresholds are:
Memstore will be flushed to disk if size of the memstore
exceeds this number of bytes. Value is checked by a thread that runs
<description>Maximum size of all memstores in a region server before
flushes are forced. Defaults to 35% of heap.
This value equal to hbase.regionserver.global.memstore.upperLimit causes
the minimum possible flushing to occur when updates are blocked due to
Note that the first setting is the size per Memstore. I.e. when you define it you should take into account the number of regions served by each RS. When number of RS grows (and you configured the setting when there were few of them) Memstore flushes are likely to be triggered by the second threshold earlier.
Second group of settings is for safety reasons: sometimes write load is so high that flushing cannot keep up with it and since we don’t want memstore to grow without a limit, in this situation writes are blocked unless memstore has “manageable” size. These thresholds are configured with:
<description>Maximum size of all memstores in a region server before new
updates are blocked and flushes are forced. Defaults to 40% of heap.
Updates are blocked and flushes are forced until size of all memstores
in a region server hits hbase.regionserver.global.memstore.lowerLimit.
Block updates if memstore has hbase.hregion.block.memstore
time hbase.hregion.flush.size bytes. Useful preventing
runaway memstore during spikes in update traffic. Without an
upper-bound, memstore fills such that when it flushes the
resultant flush files take a long time to compact or split, or
worse, we OOME.
Blocking writes on particular RS on its own may be a big issue, but there’s more to that. Since in HBase by design one Region is served by single RS when write load is evenly distributed over the cluster (over Regions) having one such “slow” RS will make the whole cluster work slower (basically, at its speed).
Hint: watch for Memstore Size and Memstore Flush Queue size. Memstore Size ideally should not reach upper Memstore limit and Memstore Flush Queue size should not constantly grow.
Frequent Memstore Flushes
Since we want to avoid blocking writes it may seem a good approach to flush earlier when we are far from “writes-blocking” thresholds. However, this will cause too frequent flushes which can affect read performance and bring additional load to the cluster.
Every time Memstore flush happens one HFile created for each CF. Frequent flushes may create tons of HFiles. Since during reading HBase will have to look at many HFiles, the read speed can suffer.
To prevent opening too many HFiles and avoid read performance deterioration there’s HFiles compaction process. HBase will periodically (when certain configurable thresholds are met) compact multiple smaller HFiles into a big one. Obviously, the more files created by Memstore flushes, the more work (extra load) for the system. More to that: while compaction process is usually performed in parallel with serving other requests, when HBase cannot keep up with compacting HFiles (yes, there are configured thresholds for that too;)) it will block writes on RSagain. As mentioned above, this is highly undesirable.
Hint: watch for Compaction Queue size on RSs. In case it is constantly growing you should take actions before it will cause problems.
More on HFiles creation & Compaction can be found here.
So, ideally Memstore should use as much memory as it can (as configured, not all RS heap: there are also in-memory caches), but not cross the upper limit. This picture (screenshot was taken from our SPM monitoring service) shows somewhat good situation:
“Somewhat”, because we could configure lower limit to be closer to upper, since we barely ever go over it.
Multiple Column Families & Memstore Flush
Memstores of all column families are flushed together (this might change). This means creating N HFiles per flush, one for each CF. Thus, uneven data amount in CF will cause too many HFiles to be created: when Memstore of one CF reaches threshold all Memstores of other CFs are flushed too. As stated above too frequent flush operations and too many HFiles may affect cluster performance.
Hint: in many cases having one CF is the best schema design.
HLog (WAL) Size & Memstore Flush
On RegionServer write/read paths picture above you may also noticed a Write-ahead Log (WAL) where data is getting written by default. It contains all the edits of RegionServer which were written to Memstore but were not flushed into HFiles. As data in Memstore is not persistent we need WAL to recover from RegionServer failures. When RS crushes and data which was stored in Memstore and wasn’t flushed is lost, WAL is used to replay these recent edits.
When WAL (in HBase it is called HLog) grows very big, it may take a lot of time to replay it. For that reason there are certain limits for WAL size, which when reached cause Memstore to flush. Flushing Memstores decreases WAL as we don’t need to keep in WAL edits which were written to HFiles (persistent store). This is configured by two properties: hbase.regionserver.hlog.blocksize and hbase.regionserver.maxlogs. As you probably figured out, maximum WAL size is determined by hbase.regionserver.maxlogs * hbase.regionserver.hlog.blocksize (2GB by default). When this size is reached, Memstore flushes are triggered. So, when you increase Memstore size and adjust other Memstore settings you need to adjust HLog ones as well. Otherwise WAL size limit may be hit first and you will never utilize all the resources dedicated to Memstore. Apart from that, triggering of Memstore flushes by reaching WAL limit is not the best way to trigger flushing, as it may create “storm” of flushes by trying to flush many Regions at once when written data is well distributed across Regions.
Hint: keep hbase.regionserver.hlog.blocksize * hbase.regionserver.maxlogs just a bit above hbase.regionserver.global.memstore.lowerLimit * HBASE_HEAPSIZE.
Compression & Memstore Flush
With HBase it is advised to compress the data stored on HDFS (i.e. HFiles). In addition to saving on space occupied by data this reduces the disk & network IO significantly. Data is compressed when it is written to HDFS, i.e. when Memstore flushes. Compression should not slow down flushing process a lot, otherwise we may hit many of the problems above, like blocking writes caused by Memstore being too big (hit upper limit) and such.
Hint: when choosing compression type favor compression speed over compression ratio. SNAPPY showed to be a good choice here.
[UPDATE 2012/07/23: added notes about HLog size and Compression types]
Plug: if this sort of stuff interests you, we are hiring people who know about Hadoop, HBase, MapReduce…
Last week we had our inaugural HBase NYC meetup. About 30 people turned up – not bad for the first meetup. Etsy, Sematext old customer and Brooklyn neighbours, provided the space, AV equipment and help, as well as their fridge with beer – thanks! Alex gave two talks, first the Introduction to HBase whose slides (audio) are below and then Intro to HBase Internals and Schema Design.
We’re hiring people who want to work WITH and ON HBase and other Big Data technologies. Seejobs @ sematext.