Kafka Consumer Lag Monitoring

SPM is one of the most comprehensive Kafka monitoring solutions, capturing some 200 Kafka metrics, including Kafka Broker, Producer, and Consumer metrics. While lots of those metrics are useful, there is one particular metric everyone wants to monitor – Consumer Lag.

What is Consumer Lag

When people talk about Kafka or about a Kafka cluster, they are typically referring to Kafka Brokers. You can think of a Kafka Broker as a Kafka server. A Broker is what actually stores and serves Kafka messages. Kafka Producers are applications that write messages into Kafka (Brokers). Kafka Consumers are applications that read messages from Kafka (Brokers).

Inside Kafka Brokers data is stored in one or more Topics, and each Topic consists of one or more Partitions. When writing data a Broker actually writes it into a specific Partition. As it writes data it keeps track of the last “write position” in each Partition. This is called Latest Offset also known as Log End Offset. Each Partition has its own independent Latest Offset.

Just like Brokers keep track of their write position in each Partition, each Consumer keeps track of “read position” in each Partition whose data it is consuming. That is, it keeps track of which data it has read. This is known as Consumer Offset. This Consumer Offset is periodically persisted (to ZooKeeper or a special Topic in Kafka itself) so it can survive Consumer crashes or unclean shutdowns and avoid re-consuming too much old data.

Kafka Consumer Lag and Read/Write Rates
Kafka Consumer Lag and Read/Write Rates

In our diagram above we can see yellow bars, which represents the rate at which Brokers are writing messages created by Producers.  The orange bars represent the rate at which Consumers are consuming messages from Brokers. The rates look roughly equal – and they need to be, otherwise the Consumers will fall behind.  However, there is always going to be some delay between the moment a message is written and the moment it is consumed. Reads are always going to be lagging behind writes, and that is what we call Consumer Lag. The Consumer Lag is simply the delta between the Latest Offset and Consumer Offset.

Why is Consumer Lag Important

Many applications today are based on being able to process (near) real-time data. Think about performance monitoring system like SPM or log management service like Logsene. They continuously process infinite streams of near real-time data. If they were to show you metrics or logs with too much delay – if the Consumer Lag were too big – they’d be nearly useless.  This Consumer Lag tells us how far behind each Consumer (Group) is in each Partition.  The smaller the lag the more real-time the data consumption.

Monitoring Read and Write Rates

Kafka Consumer Lag and Broker Offset Changes
Kafka Consumer Lag and Broker Offset Changes

As we just learned the delta between the Latest Offset and the Consumer Offset is what gives us the Consumer Lag.  In the above chart from SPM you may have noticed a few other metrics:

  • Broker Write Rate
  • Consume Rate
  • Broker Earliest Offset Changes

The rate metrics are derived metrics.  If you look at Kafka’s metrics you won’t find them there.  Under the hood SPM collects a few metrics with various offsets from which these rates are computed.  In addition, it charts Broker Earliest Offset Changes, which is  the earliest known offset in each Broker’s Partition.  Put another way, this offset is the offset of the oldest message in a Partition.  While this offset alone may not be super useful, knowing how it’s changing could be handy when things go awry.  Data in Kafka has has a certain TTL (Time To Live) to allow for easy purging of old data.  This purging is performed by Kafka itself.  Every time such purging kicks in the offset of the oldest data changes.  SPM’s Broker Earliest Offset Change surfaces this information for your monitoring pleasure.  This metric gives you an idea how often purges are happening and how many messages they’ve removed each time they ran.

There are several Kafka monitoring tools out there that, like LinkedIn’s Burrow, whose Offset and Consumer Lag monitoring approach is used in SPM.  If you need a good Kafka monitoring solution, give SPM a go.  Ship your Kafka and other logs into Logsene and you’ve got yourself a DevOps solution that will make troubleshooting easy instead of dreadful.

 

Monitoring Kafka on Docker Cloud

For those of you using Apache Kafka and Docker Cloud or considering it, we’ve got a Sematext user case study for your reading pleasure. In this use case, Ján Antala, a Software Engineer in the DevOps Team at @pygmalios, talks about the business and technical needs that drove their decision to use Docker Cloud, how they are using Docker Cloud, and how their Docker and Kafka monitoring is done.

Pygmalios – Future of data-driven retail.

Pygmalios Logo

Pygmalios helps companies monitor how customers and staff interact in real-time. Our retail analytics platforms tracks sales, display conversions, customers and staff behavior to deliver better service, targeted sales, faster check-outs and the optimal amount of staffing for a given time and location. Among our partners are big names such as dm drogerie or BMW.

I am a software engineer on a DevOps position so I know about all the challenges from both sides – infrastructure as well as software development.

Our infrastructure

In Pygmalios we have decided to use the architecture based on microservices for our analytics platform. We have a complex system of Apache Spark, Apache Kafka, Cassandra and InfluxDB databases, Node.js backend and JavaScript frontend applications where every service has its own single responsibility which makes them easy to scale. We run them mostly in Docker containers apart from Spark and Cassandra which run on the DataStax Enterprise stack.

We have around 50 different Docker services in total. Why Docker? Because it’s easy to deploy, scale and you don’t have to care about where you run your applications. You can even transfer them between node clusters in seconds. We don’t have our own servers but use cloud providers instead, especially AWS. We have been using Tutum to orchestrate our Docker containers for the past year (Tutum was acquired by Docker recently and the service is now called Docker Cloud).
Docker Cloud is the best service for Docker container management and deployment and totally matches our needs. You can create servers on any cloud provider or bring your own, add Docker image and write stack file where you can list rules which specify what and where to deploy. Then you can manage all your services and nodes via a dashboard. We really love the CI & CD features. When we push a new commit to Github the Docker image is built and then automatically deployed to the production.

DevOps Challenges

As we use a microservices architecture we have a lot of applications across multiple servers so we need to orchestrate them. We also have many physical sensors outside in the retail stores which are our data sources. In the end, there are a lot of things we have to think about including correlations between them:

Server monitoring

Basic metrics for the hardware layer such as memory, cpu and network.

Docker monitoring

In the software layer we want to know whether our applications inside Docker containers are running properly.

Kafka, Spark and Cassandra monitoring

Our core services. They are crucial, so monitoring is a must.

Sensors monitoring

Sensors are deployed outside in the retail stores. We have to monitor them as well and use custom metrics.

Notifications

We want alerts whenever anything breaks.

Centralized logging

Store all logs in one place, combine them with hardware usage and then analyze anomalies.

Monitoring & Logging on Docker Cloud

There is already a great post about Docker Cloud Monitoring and Logging so more information go to this blog: Docker Cloud Monitoring and Logging.

Kafka on Docker Cloud

We use a cluster of 3 brokers each running in a Docker container across nodes because Kafka is crucial for us. We are not collecting any data when Kafka is not available so they are lost forever if Kafka is ever down. Sure, we have buffers inside sensors, but we don’t want to rely on them. All topics are also replicated between all brokers so we can handle outage of 2 nodes. Our goal is also to scale it easily.

Kafka and Zookeeper live together so you have to link them using connection parameters. Kafka doesn’t have a master broker but the leader is automatically elected by Zookeeper from available brokers. Zookeeper elects its own leader automatically. To scale Kafka and Zookeeper to more nodes we just have to add them into Docker Cloud cluster as we use every_node deployment strategy and update connection link in the stack file.

We use our own fork of wurstmeister/kafka and signalfx/docker-zookeeper Docker images and I would encourage you to do the same so you can easily tune them to your needs.

To run Kafka + Zookeeper cluster launch the following stack on Docker Cloud.

Code from https://gist.github.com/janantala/c93a284e3f93bc7d7942f749aae520af

kafka:
image: 'pygmalios/kafka:latest'
deployment_strategy: every_node
environment:
- JMX_PORT=9999
- KAFKA_ADVERTISED_HOST_NAME=$DOCKERCLOUD_CONTAINER_HOSTNAME
- KAFKA_ADVERTISED_PORT=9092
- KAFKA_DEFAULT_REPLICATION_FACTOR=3
- KAFKA_DELETE_TOPIC_ENABLE=true
- KAFKA_LOG_CLEANER_ENABLE=true
- 'KAFKA_ZOOKEEPER_CONNECT=zookeeper-1:2181,zookeeper-2:2181,zookeeper-3:2181'
- KAFKA_ZOOKEEPER_CONNECTION_TIMEOUT_MS=6000
ports:
- '9092:9092'
- '9999:9999'
restart: always
tags:
- kafka
volumes:
- '/var/run/docker.sock:/var/run/docker.sock'
zookeeper:
image: 'pygmalios/zookeeper-cluster:latest'
deployment_strategy: every_node
environment:
- CONTAINER_NAME=$DOCKERCLOUD_CONTAINER_HOSTNAME
- SERVICE_NAME=zookeeper
- 'ZOOKEEPER_INSTANCES=zookeeper-1,zookeeper-2,zookeeper-3'
- 'ZOOKEEPER_SERVER_IDS=zookeeper-1:1,zookeeper-2:2,zookeeper-3:3'
- ZOOKEEPER_ZOOKEEPER_1_CLIENT_PORT=2181
- ZOOKEEPER_ZOOKEEPER_1_HOST=zookeeper-1
- ZOOKEEPER_ZOOKEEPER_1_LEADER_ELECTION_PORT=3888
- ZOOKEEPER_ZOOKEEPER_1_PEER_PORT=2888
- ZOOKEEPER_ZOOKEEPER_2_CLIENT_PORT=2181
- ZOOKEEPER_ZOOKEEPER_2_HOST=zookeeper-2
- ZOOKEEPER_ZOOKEEPER_2_LEADER_ELECTION_PORT=3888
- ZOOKEEPER_ZOOKEEPER_2_PEER_PORT=2888
- ZOOKEEPER_ZOOKEEPER_3_CLIENT_PORT=2181
- ZOOKEEPER_ZOOKEEPER_3_HOST=zookeeper-3
- ZOOKEEPER_ZOOKEEPER_3_LEADER_ELECTION_PORT=3888
- ZOOKEEPER_ZOOKEEPER_3_PEER_PORT=2888
ports:
- '2181:2181'
- '2888:2888'
- '3888:3888'
restart: always
tags:
- kafka
volumes:
- '/var/lib/zookeeper:/var/lib/zookeeper'
- '/var/log/zookeeper:/var/log/zookeeper'

We use private networking and hostname addressing (KAFKA_ADVERTISED_HOST_NAME environment variable) for security reasons in our stack. However, you can use IP addressing directly when you replace hostname by IP address. To connect to Kafka from outside environment you have to add records into /etc/hosts file:

KAFKA_NODE.1.IP.ADDRESS kafka-1
KAFKA_NODE.2.IP.ADDRESS kafka-2
KAFKA_NODE.3.IP.ADDRESS kafka-3
KAFKA_NODE.1.IP.ADDRESS zookeeper-1
KAFKA_NODE.2.IP.ADDRESS zookeeper-2
KAFKA_NODE.3.IP.ADDRESS zookeeper-3

Or on Docker Cloud add extra_hosts into service configuration.

extra_hosts:
- 'kafka-1:KAFKA_NODE.1.IP.ADDRESS'
- 'kafka-2:KAFKA_NODE.2.IP.ADDRESS'
- 'kafka-3:KAFKA_NODE.3.IP.ADDRESS'
- 'zookeeper-1:KAFKA_NODE.1.IP.ADDRESS'
- 'zookeeper-2:KAFKA_NODE.2.IP.ADDRESS'
- 'zookeeper-3:KAFKA_NODE.3.IP.ADDRESS'

Then you can use following Zookeeper connection string to connect Kafka:

zookeeper-1:2181,zookeeper-2:2181,zookeeper-3:2181

And Kafka broker list:

kafka-1:9092,kafka-2:9092,kafka-3:9092

Kafka + SPM

To monitor Kafka we use SPM by Sematext which provides Kafka monitoring of all metrics for Brokers, Producers and Consumers available in JMX interface out of the box. They also provide monitoring for other apps we use such as Spark, Cassandra, Docker images and we can also collect logs so we have it all in one place. When we have this information we can find out not only when something happened, but also why.

Our Kafka node cluster with Docker containers is displayed in the following diagram:

image alt text

SPM Performance Monitoring for Kafka

SPM collects performance metrics of Kafka. First you have to create an SPM application of type Kafka in the Sematext dashboard and connect SPM client Docker container from sematext/spm-client image. We use SPM client in-process mode as a Java agent so it is easy to set up. Just add SPM_CONFIG environment variable to SPM client Docker container, where you specify monitor configuration of Kafka Brokers, Consumers and Producers. Note, that you have to use your own SPM token, instead of YOUR_SPM_TOKEN.

create new SPM app

sematext-agent-kafka:
image: 'sematext/spm-client:latest'
deployment_strategy: every_node
environment:
- 'SPM_CONFIG=YOUR_SPM_TOKEN kafka javaagent kafka-broker;YOUR_SPM_TOKEN kafka javaagent kafka-producer;YOUR_SPM_TOKEN kafka javaagent kafka-consumer'
restart: always
tags:
- kafka

Kafka

You have to also connect Kafka and SPM monitor together. This can be done by mounting volume from the SPM monitor service into Kafka container using volumes_from option. To enable the SPM monitor just add KAFKA_JMX_OPTS environment variable into Kafka container by adding the following arguments to your JVM startup script for Kafka Broker, Producer & Consumer.

KAFKA_JMX_OPTS=-Dcom.sun.management.jmxremote -javaagent:/opt/spm/spm-monitor/lib/spm-monitor-kafka.jar=YOUR_SPM_TOKEN:kafka-broker:default -Dcom.sun.management.jmxremote -javaagent:/opt/spm/spm-monitor/lib/spm-monitor-kafka.jar=YOUR_SPM_TOKEN:kafka-producer:default -Dcom.sun.management.jmxremote -javaagent:/opt/spm/spm-monitor/lib/spm-monitor-kafka.jar=YOUR_SPM_TOKEN:kafka-consumer:default -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false

Done! Your Kafka cluster monitoring is set up. Now you can monitor requests, topics and other JMX metrics out of the box or you can create custom dashboards by connecting other apps.

image alt text

Kafka metrics overview in SPM

image alt text

Requests

image alt text

Topic Bytes/Messages

Stack file

To run Zookeeper + Kafka + SPM monitoring cluster just launch following stack and update these environment variables in your stack file:

  • YOUR_SPM_TOKEN inside SPM_CONFIG in Sematext monitoring service and KAFKA_JMX_OPTS in Kafka service

Code from https://gist.github.com/janantala/d816071a7a00eefeea934ec630a57c07

Kafaka, Zookeeper, SPM Stack File

kafka:
image: 'pygmalios/kafka:latest'
deployment_strategy: every_node
environment:
- JMX_PORT=9999
- KAFKA_ADVERTISED_HOST_NAME=$DOCKERCLOUD_CONTAINER_HOSTNAME
- KAFKA_ADVERTISED_PORT=9092
- KAFKA_DEFAULT_REPLICATION_FACTOR=3
- KAFKA_DELETE_TOPIC_ENABLE=true
- **'KAFKA_JMX_OPTS=-Dcom.sun.management.jmxremote -javaagent:/opt/spm/spm-monitor/lib/spm-monitor-kafka.jar=YOUR_SPM_TOKEN:kafka-broker:default -Dcom.sun.management.jmxremote -javaagent:/opt/spm/spm-monitor/lib/spm-monitor-kafka.jar=YOUR_SPM_TOKEN:kafka-producer:default -Dcom.sun.management.jmxremote -javaagent:/opt/spm/spm-monitor/lib/spm-monitor-kafka.jar=YOUR_SPM_TOKEN:kafka-consumer:default -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false'
** - KAFKA_LOG_CLEANER_ENABLE=true
- 'KAFKA_ZOOKEEPER_CONNECT=zookeeper-1:2181,zookeeper-2:2181,zookeeper-3:2181'
- KAFKA_ZOOKEEPER_CONNECTION_TIMEOUT_MS=6000
ports:
- '9092:9092'
- '9999:9999'
restart: always
tags:
- kafka
volumes:
- '/var/run/docker.sock:/var/run/docker.sock'
**volumes_from:
- sematext-agent-kafka**
sematext-agent-kafka:
image: 'sematext/spm-client:latest'
deployment_strategy: every_node
environment:
- **'SPM_CONFIG=YOUR_SPM_TOKEN kafka javaagent kafka-broker;YOUR_SPM_TOKEN kafka javaagent kafka-producer;YOUR_SPM_TOKEN kafka javaagent kafka-consumer'**
restart: always
tags:
- kafka
zookeeper:
image: 'pygmalios/zookeeper-cluster:latest'
deployment_strategy: every_node
environment:
- CONTAINER_NAME=$DOCKERCLOUD_CONTAINER_HOSTNAME
- SERVICE_NAME=zookeeper
- 'ZOOKEEPER_INSTANCES=zookeeper-1,zookeeper-2,zookeeper-3'
- 'ZOOKEEPER_SERVER_IDS=zookeeper-1:1,zookeeper-2:2,zookeeper-3:3'
- ZOOKEEPER_ZOOKEEPER_1_CLIENT_PORT=2181
- ZOOKEEPER_ZOOKEEPER_1_HOST=zookeeper-1
- ZOOKEEPER_ZOOKEEPER_1_LEADER_ELECTION_PORT=3888
- ZOOKEEPER_ZOOKEEPER_1_PEER_PORT=2888
- ZOOKEEPER_ZOOKEEPER_2_CLIENT_PORT=2181
- ZOOKEEPER_ZOOKEEPER_2_HOST=zookeeper-2
- ZOOKEEPER_ZOOKEEPER_2_LEADER_ELECTION_PORT=3888
- ZOOKEEPER_ZOOKEEPER_2_PEER_PORT=2888
- ZOOKEEPER_ZOOKEEPER_3_CLIENT_PORT=2181
- ZOOKEEPER_ZOOKEEPER_3_HOST=zookeeper-3
- ZOOKEEPER_ZOOKEEPER_3_LEADER_ELECTION_PORT=3888
- ZOOKEEPER_ZOOKEEPER_3_PEER_PORT=2888
ports:
- '2181:2181'
- '2888:2888'
- '3888:3888'
restart: always
tags:
- kafka
volumes:
- '/var/lib/zookeeper:/var/lib/zookeeper'
- '/var/log/zookeeper:/var/log/zookeeper'

image alt text

Summary

Thanks to Sematext you can easily monitor all important metrics. Basic setup should take only a few minutes and then you can tune it to your needs, connect with other applications and create custom dashboards.

If you have feedback for monitoring Kafka cluster get in touch with me @janantala or email me at j.antala@pygmalios.com. You can also follow us @pygmalios for more cool stuff. If you have problems setting your monitoring and logging don’t hesitate to send an email to support@sematext.com or tweet @sematext.

Kafka Real-time Stream Multi-topic Catch up Trick

Half of the world, Sematext included, seems to be using Kafka.

Kafka is the spinal cord that connects various components in SPM, Site Search Analytics, and Logsene.  If Kafka breaks, we’re in trouble (but we have anomaly detection all over the place to catch issues early).  In many Kafka deployments, ours included, the most recent data is the most valuable.  Consider the case of Kafka in SPM, which processes massive amounts of performance metrics for monitoring applications and servers.  Clearly, in a performance monitoring system you primarily care about current performance numbers.  Thus, if SPM’s Kafka pipeline were to break and we restore it, what we’d really like to avoid is processing all data sequentially, oldest to newest.  What we’d prefer is processing new metrics data first and then processing older data using any spare capacity we have in order to “fill the gap” caused by Kafka downtime.

Here’s a very quick “video” that show this in action:

Kafka Catch Up
Kafka Catch Up

 

How does this work?

We asked about it back in 2013, but didn’t really get good tips.  Shortly after that we implemented the following logic that’s been working well for us, as you can see in the animation above.

The catch up logic assumes having multiple topics to consume from and one of these topics being the “active” topic to which producer is publishing messages. Consumer sets which topic is active, although Producer can also set it if it has not already been set. The active topic is set in ZooKeeper.

Consumer looks at the lag by looking at the timestamp that Producer adds to each message published to Kafka. If the lag is over N minutes then Consumer starts paying attention to the offset.  If the offset starts getting smaller and keeps getting smaller M times in a row, then Consumer knows we are able to keep up (i.e. the offset is not getting bigger) and sets another topic as active. This signals to Producer to switch publishing to this new topic, while Consumer keeps consuming from all topics.

As the result, Consumer is able to consume both new data and the delayed/old data and avoid not having fresh data while we are in catch-up mode busy processing the backlog.  Consuming from one topic is what causes new data to be processed (this corresponds to the right-most part of the chart above “moving forward”), and consuming from the other topic is where we get data for filling in the gap.

If you run Kafka and want a good monitoring tool for Kafka, check out SPM for Kafka monitoring.

 

Recipe: rsyslog + Kafka + Logstash

This recipe is similar to the previous rsyslog + Redis + Logstash one, except that we’ll use Kafka as a central buffer and connecting point instead of Redis. You’ll have more of the same advantages:

  • rsyslog is light and crazy-fast, including when you want it to tail files and parse unstructured data (see the Apache logs + rsyslog + Elasticsearch recipe)
  • Kafka is awesome at buffering things
  • Logstash can transform your logs and connect them to N destinations with unmatched ease

There are a couple of differences to the Redis recipe, though:

  • rsyslog already has Kafka output packages, so it’s easier to set up
  • Kafka has a different set of features than Redis (trying to avoid flame wars here) when it comes to queues and scaling

As with the other recipes, I’ll show you how to install and configure the needed components. The end result would be that local syslog (and tailed files, if you want to tail them) will end up in Elasticsearch, or a logging SaaS like Logsene (which exposes the Elasticsearch API for both indexing and searching). Of course you can choose to change your rsyslog configuration to parse logs as well (as we’ve shown before), and change Logstash to do other things (like adding GeoIP info).

Getting the ingredients

First of all, you’ll probably need to update rsyslog. Most distros come with ancient versions and don’t have the plugins you need. From the official packages you can install:

If you don’t have Kafka already, you can set it up by downloading the binary tar. And then you can follow the quickstart guide. Basically you’ll have to start Zookeeper first (assuming you don’t have one already that you’d want to re-use):

bin/zookeeper-server-start.sh config/zookeeper.properties

And then start Kafka itself and create a simple 1-partition topic that we’ll use for pushing logs from rsyslog to Logstash. Let’s call it rsyslog_logstash:

bin/kafka-server-start.sh config/server.properties
bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic rsyslog_logstash

Finally, you’ll have Logstash. At the time of writing this, we have a beta of 2.0, which comes with lots of improvements (including huge performance gains of the GeoIP filter I touched on earlier). After downloading and unpacking, you can start it via:

bin/logstash -f logstash.conf

Though you also have packages, in which case you’d put the configuration file in /etc/logstash/conf.d/ and start it with the init script.

Configuring rsyslog

With rsyslog, you’d need to load the needed modules first:

module(load="imuxsock")  # will listen to your local syslog
module(load="imfile")    # if you want to tail files
module(load="omkafka")   # lets you send to Kafka

If you want to tail files, you’d have to add definitions for each group of files like this:

input(type="imfile"
  File="/opt/logs/example*.log"
  Tag="examplelogs"
)

Then you’d need a template that will build JSON documents out of your logs. You would publish these JSON’s to Kafka and consume them with Logstash. Here’s one that works well for plain syslog and tailed files that aren’t parsed via mmnormalize:

template(name="json_lines" type="list" option.json="on") {
  constant(value="{")
  constant(value=""timestamp":"")
  property(name="timereported" dateFormat="rfc3339")
  constant(value="","message":"")
  property(name="msg")
  constant(value="","host":"")
  property(name="hostname")
  constant(value="","severity":"")
  property(name="syslogseverity-text")
  constant(value="","facility":"")
  property(name="syslogfacility-text")
  constant(value="","syslog-tag":"")
  property(name="syslogtag")
  constant(value=""}")
}

By default, rsyslog has a memory queue of 10K messages and has a single thread that works with batches of up to 16 messages (you can find all queue parameters here). You may want to change:
– the batch size, which also controls the maximum number of messages to be sent to Kafka at once
– the number of threads, which would parallelize sending to Kafka as well
– the size of the queue and its nature: in-memory(default), disk or disk-assisted

In a rsyslog->Kafka->Logstash setup I assume you want to keep rsyslog light, so these numbers would be small, like:

main_queue(
  queue.workerthreads="1"      # threads to work on the queue
  queue.dequeueBatchSize="100" # max number of messages to process at once
  queue.size="10000"           # max queue size
)

Finally, to publish to Kafka you’d mainly specify the brokers to connect to (in this example we have one listening to localhost:9092) and the name of the topic we just created:

action(
  broker=["localhost:9092"]
  type="omkafka"
  topic="rsyslog_logstash"
  template="json"
)

Assuming Kafka is started, rsyslog will keep pushing to it.

Configuring Logstash

This is the part where we pick the JSON logs (as defined in the earlier template) and forward them to the preferred destinations. First, we have the input, which will use to the Kafka topic we created. To connect, we’ll point Logstash to Zookeeper, and it will fetch all the info about Kafka from there:

input {
  kafka {
    zk_connect => "localhost:2181"
    topic_id => "rsyslog_logstash"
  }
}

At this point, you may want to use various filters to change your logs before pushing to Logsene or Elasticsearch. For this last step, you’d use the Elasticsearch output:

output {
  elasticsearch {
    hosts => "logsene-receiver.sematext.com:443" # it used to be "host" and "port" pre-2.0
    ssl => "true"
    index => "your Logsene app token goes here"
    manage_template => false
    #protocol => "http" # removed in 2.0
    #port => "443" # removed in 2.0
  }
}

And that’s it! Now you can use Kibana (or, in the case of Logsene, either Kibana or Logsene’s own UI) to search your logs!

Monitoring Stream Processing Tools: Cassandra, Kafka and Spark

One of the trends we see in our Elasticsearch and Solr consulting work is that everyone is processing one kind of data stream or another.  Including us, actually – we process endless streams of metrics, continuous log and even streams, high volume clickstreams, etc.  Kafka is clearly the de facto messaging standard.  In the data processing layer Storm used to be a big hit and, while we still encounter it, we see more and more Spark.  What comes out of Spark typically ends up in Cassandra, or Elasticsearch, or HBase, or some other scalable distributed data store capable of high write rates and analytical read loads.

Kafka-Spark-Cassandra - mish-mash tooling

The left side of this figure shows a typical data processing pipeline, and while this is the core of a lot of modern data processing applications, there are often additional pieces of technology involved – you may have Nginx or Apache exposing REST APIs, perhaps Redis for caching, and maybe MySQL for storing information about users of your system.

Once you put an application like this in production, you better keep an eye on it – there are lots of moving pieces and if any one of them isn’t working well you may start getting angry emails from unhappy users or worse – losing customers.

Imagine you had to use multiple different tools, open-source or commercial, to monitor your full stack, and perhaps an additional tool (ELK or Splunk or …) to handle your logs (you don’t just write them to local FS, compress, and rotate, do you?) Yuck!  But that is what that whole right side of the above figure is about.  Each of the tools on that right side is different – they are different open-source projects with different authors, have different versions, are possibly written in different languages, are released on different cycles, are using different deployment and configuration mechanism, etc.  There are also a number of arrows there.  Some carry metrics to Graphite (or Ganglia or …), others connect to Nagios which provides alerting, while another set of arrows represent log shipping to ELK stack.  One could then further ask – well, what/who monitors your ELK cluster then?!?  (don’t say Marvel, because then the question is who watches Marvel’s own ES cluster and we’ll get stack overflow!)  That’s another set of arrows going from the Elasticsearch object.  This is a common picture.  We’ve all seen similar setups!

Our goal with SPM and Logsene is to simplify this picture and thus the lives of DevOps, Ops, and Engineers who need to manage deployments like this stream processing pipeline.  We do that by providing a monitoring and log management solution that can handle all these technologies really well.   Those multiple tools, multiple types of UIs, multiple logins….. they are a bit of a nightmare or at least a not very efficient setup.  We don’t want that.  We want this:

Kafka-Spark-Cassandra

Once you have something like SPM & Logsene in place you can see your complete application stack, all tiers from frontend to backend, in a single pane of glass. Nirvana… almost, because what comes after monitoring?  Alerting, Anomaly Detection, notifications via email, HipChat, Slack, PageDuty, WebHooks, etc.  The reality of the DevOps life is that we can’t just watch pretty, colourful, real-time charts – we also need to set up and handle various alerts.  But in case of SPM & Logsene, at least this is all in one place – you don’t need to fiddle with Nagios or some other alerting tool that needs to be integrated with the rest of the toolchain.

So there you have it.  If you like what we described here, try SPM and/or Logsene – simply sign up here – there’s no commitment and no credit card required.  Small startups, startups with no or very little outside investment money, non-profit and educational institutions get special pricing – just get in touch with us.  If you’d like to help us make SPM and Logsene even better, we are hiring!

Poll Results: Kafka Version Distribution

The results for Apache Kafka version distribution poll are in.  Thanks to everyone who took the time to vote!

The distribution pie chart is below, but we could summarize it as follows:

  • Only about 5% of Kafka 0.7.x users didn’t indicate they will upgrade to 0.8.2.x in the next 2 months
  • Only about 14% of Kafka 0.8.1.x users didn’t indicate they will upgrade to 0.8.2.x in the next 2 months
  • Over 42% of Kafka users are already using 0.8.2.x!
  • Over 80% of Kafka users say they will be using 0.8.2.x within the next 2 months!

It’s great to see Kafka users being so quick to migrate to the latest version of Kafka!  We’re extra happy to see such quick 0.8.2 adoption because we put a lot of effort into improving Kafka metric, as well as making all 100+ Kafka metrics available via SPM Kafka 0.8.2 monitoring a few weeks ago, right after Kafka 0.8.2 was released.

Apache Kafka Version Distribution
Apache Kafka Version Distribution

 

You may also want to check out the results of our recent Kafka Producer/Consumer language poll.

 

Kafka Poll: Version You Use?

UPDATE: Poll Results!

With Kafka 0.8.2 and 0.8.2.1 being released and with the updated SPM for Kafka monitoring over 100 Kafka metrics, we thought it would be good to see which Kafka versions are being used in the wild.  Kafka 0.7.x was a strong and stable release used by many.  The 0.8.1.x release has been out since March 2014.  Kafka 0.8.2.x has been out for just a little while, but…. are there any people who are either already using it (we are!) or are about to upgrade to it? Please tweet this poll and help us spread the word, so we can get a good, statistically significant results.  We’ll publish the results here and via @sematext (follow us!) in a week.

Please tweet this poll and help us spread the word, so we can get a good, statistically significant results.  We’ll publish the results here and via @sematext (follow us!) in a week.

Kafka 0.8.2 Monitoring Support

SPM Performance Monitoring is the first Apache Kafka monitoring tool to support Kafka 0.8.2.  Here are all the details:

Shiny, New Kafka Metrics

Kafka 0.8.2 has a pile of new metrics for all three main Kafka components: Producers, Brokers, and Consumers.  Not only does it have a lot of new metrics, the whole metrics part of Kafka has been redone — we worked closely with Kafka developers for several weeks to bring order and structure to all Kafka metrics and make them easy to collect, parse and interpret.

We could list all the Kafka metrics you can get via SPM, but in short — SPM monitors all Kafka metrics and, as with all things SPM monitors, all these metrics are nicely graphed and are filterable by server name, topic, partition, and everything else that makes sense in Kafka deployments.

103 Kafka metrics:

  • Broker: 43 metrics
  • Producer: 9 metrics
  • New Producer: 38 metrics
  • Consumer: 13 metrics

You will be hard-pressed to find another solution that can monitor that many Kafka metrics out of the box! And if you want to do something with your Kafka logs, Logsene will gladly make them searchable for you!

Needless to say, SPM shows the most sought after Kafka metric – the Consumer Lag (see the screenshot below).

Screenshot – Kafka Metrics Overview  (click to enlarge)

kafka-overview_annotated_1

Screenshot – Consumer Lag  (click to enlarge)

Kafa_Consumer_Lag_annotated

Monitoring Kafka in Context

Running Kafka alone is pointless. On one side you process or collect data and push it into Kafka.  On the other side you consume that data (maybe processing it some more) and in the end this data typically ends up landing in some data store. Kafka is often used with data processing frameworks like Spark, Storm and Hadoop, or data stores like Cassandra and HBase, search engines like Elasticsearch and Solr, and so on.  Wouldn’t it be nice to have a single place to monitor all of these systems?  With alerts and anomaly detection?  And letting you collect and search all their logs?  Guess what?  SPM and Logsene do exactly that — they can monitor all of these technologies and make all their logs searchable!

Take a Test Drive — It’s Easy and Free to Get Started

Like what you see here?  Sound like something that could benefit your organization?  Then try SPM for Free for 30 days by registering here.  There’s no commitment and no credit card required.

Poll Results: Kafka Producer/Consumer

About 10 days ago we ran a a poll about which languages/APIs people use when writing their Apache Kafka Producers and Consumers.  See Kafka Poll: Producer & Consumer Client.  We collected 130 votes so far.  The results were actually somewhat surprising!  Let’s share the numbers first!

Kafka Producer/Consumer Languages
Kafka Producer/Consumer Languages

What do you think?  Is that the breakdown you expected?  Here is what surprised us:

  • Java is the dominant language on the planet today, but less than 50% people use it with Kafka! Read: possible explanation for Java & Kafka.
  • Python is clearly popular and gaining in popularity, but at 13% it looks like it’s extra popular in Kafka context.
  • Go at 10-11% seems quite popular for a relatively young language.  One might expect Ruby to have more adoption here than Go because Ruby has been around much longer.
  • We put C/C++ in the poll because these languages are still in use, though we didn’t expect it to get 6% of votes.  However, considering C/C++ are still quite heavily used generally speaking, that’s actually a pretty low percentage.
  • JavaScript and NodeJS are surprisingly low at just 4%.  Any idea why?  Is the JavaScript Kafka API not up to date or bad or ….?
  • The “Other” category is relatively big, at a bit over 12%.  Did we forget some major languages people often use with Kafka?  Scala?  See info about the Kafka Scala API here.

Everyone and their cousin is using Kafka nowadays, or at least that’s what it looks like from where we at Sematext sit.  However, because of the relatively high percentage of people using Python and Go, we’d venture to say Kafka adoption is much stronger among younger, smaller companies, where Python and Go are used more than “enterprise languages”, like Java, C#, and C/C++.

Kafka Poll: Producer & Consumer Client

Kafka has become the de-facto standard for handling real-time streams in high-volume, data-intensive applications, and there are certainly a lot of those out there.  We thought it would be valuable to conduct a quick poll to find out which which implementation of Kafka Producers and Consumers people use – specifically, which programming languages do you use to produce and consume Kafka messages?

Please tweet this poll and help us spread the word, so we can get a good, statistically significant results.  We’ll publish the results here and via @sematext (follow us!) in a week.

NOTE #: If you choose “Other”, please leave a comment with additional info, so we can share this when we publish the results, too!

NOTE #2: The results are in! See http://blog.sematext.com/2015/01/28/kafka-poll-results-producer-consumer/

Please tweet this poll and help us spread the word, so we can get a good, statistically significant results.  We’ll publish the results hereand via @sematext (follow us!) in a week.