We are back to talking about the backend again. Our Search Analytics and Scalable Performance Monitoring services/products accept, process, and store huge amounts of data. One thing both of these services do is process a stream of events in real-time (and batch, of course). So what solutions are there that help one process data in real-time and perform some operations on a rolling window of data, such as the last 5 or 30 minutes of incoming event stream? We know of several solutions that fit that bill, so we decided to put together a matrix with essential attributes of those tools in order to compare them and make our pick. Below is the matrix we came up with. If you are viewing this on our site, the table is likely going to be too wide, but it should look find in a proper feed reader.
If you like working on systems that handle large volumes of data, like our Search Analytics and Scalable Performance Monitoring services, we are hiring world-wide.
Matrix part 1:
|License||Language||Scaling||Add or change rules on the fly||Other infra needed||Rule types|
|Esper||GPL2, commercial||java||Scale up||yes||none||Declarative, query-based|
|Drools Fusion||ASL 2.0||java||Scale up||yes||none||Declarative, mostly rule based, but support queries too|
|FlumeBase||ASL 2.0||java||Horizontal: natural sharding on top of Flume||yes||Flume||Declarative, query-based|
|Storm||EPL 1.0||clojure||Horizontal||Can be implemented on top of Zookeeper||ZeroMQ, Zookeeper||Provides only low level primitives(like grouping). Rule engine should be implemented manually on top.|
|S4||ASL 2.0||java||Horizontal||Can be implemented on top of Zookeeper||Zookeeper||Provides set of low level primitives. Somehow correlation support via joins. Documentation have a “windowing” section, but it empty.|
|Activeinsight||CPAL 1.0, commercial||java||Horizontal||yes||Declarative, Query-like|
|Kafka||APL 2.0||java||Horizontal||Zookeeper||Set of low level primitives|
Matrix part 2:
|Docs / examples||Maturity||Community||URL||Notes|
|Esper||very good||mature, stable||medium||esper.codehaus.org|
|Drools Fusion||good||3 years, stable||small||jboss.org/drools/drools-fusion.html|
|Storm||exists||used in production||growing very fast||tech.backtype.com||good deployment features|
|S4||average||alpha, butused in production||medium (will grow under ASF)||s4.io|
|Kafka||good||used in production||small (will grow under ASF)||incubator.apache.org/kafka|
So there you have it – we hope you find this useful. If you have any comments or questions, tweet us (@sematext) or leave a comment here. If you like working on systems that handle large volumes of data, like our Search Analytics and Scalable Performance Monitoring services, we are hiring world-wide.