docker_monitoring_oss_cover

Open Source Docker Monitoring & Logging

Pets ⇒ Cattle ⇒ Orchestration

Docker is growing by leaps and bounds, and along with it its ecosystem.  Being light, the predominant container deployment involves running just a single app or service inside each container.  Most software products and services are made up of at least several such apps/services.  We all want all our apps/services to be highly available and fault tolerant.  Thus, Docker containers in an organization quickly start popping up like mushrooms after the rain.  They multiply faster than rabbits. While in the beginning we play with them like cute little pets, as their number quickly grow we realize we are dealing with aherd of cattle, implying we’ve become cowboys.  Managing a herd with your two hands, a horse, and a lasso willget you only so far.  You won’t be able to ride after each and every calf that wonders in the wrong direction.  To get back to containers from this zoological analogy – operating so many moving pieces at scale is impossible without orchestration – this is why we’ve seen the rise of Docker Swarm, Kubernetes, Mesos, CoreOS, RancherOS and so on.

 

Containers multiply faster than Gremlins


Pets ⇒ Cattle ⇒ Orchestration + Operational Insights

Container orchestration helps you manage your containers, their placement, their resources, and their whole life cycle.  While containers and applications in them are running, in addition to the whole life cycle management we need container monitoring and log management so we can troubleshoot performance or stability issues, debug or tune applications, and so on.  Just like with orchestration, there are a number of open-source container monitoring and logging tools.  It’s great to have choices, but having lots of them means you need to evaluate and compare them to pick the one that best matches your needs.

DevOps Tools Comparison

We’ve open-sourced our Sematext Docker Agent (SDA for short) which works with SPM for monitoring and Logsene for log management (think of it as ELK as a Service), and wanted to provide a high level comparison of SDA and several popular Docker monitoring and logging tools, like CAdvisor, Logspout, and others.  In the following table we group tools by functionality and include monitoring agents, log collectors and shippers, storage backends, and tools that provide the UI and visualizations.  For each functionality we list in the “Common Tools” column one or more popular open-source tools that provide that functionality.  An empty “Common Tools” cell means there are no popular open-source tools that provide it, or at least we are not aware of it — if we messed something up, please leave a comment or tweet @sematext.

Functionality Common Tools Sematext Tools
Collect Logs from Docker API
(including auto-discovery of new containers)
Logspout Sematext Docker Agent
Log routing Logspout
Routing setup for containers via HTTP API to syslog, redis, kafka, logstash
Docker Logging Drivers (e.g. syslog, journald, fluentd, etc.)
Sematext Docker Agent
(routing of logs to different indices based on container labels)
Automatic log tagging
(with Docker Compose or Swarm or Kubernetes metadata)
For Kubernetes: fluentd-elasticsearch, assumes Elasticsearch deployed locally Sematext Docker Agent
Collect Docker Metrics CAdvisor Sematext Docker Agent
Collect Docker Events ? Sematext Docker Agent
Logs format detection
(most tools need a static setup per logfile/application)
? Sematext Docker Agent
(out of the box format detection and parsing; the parser and the logagent-js pattern library is open source)
Logs parsing and shipping Fluentd
Logstash
rsyslog
syslog-ng
Sematext Docker Agent
Logs storage and indexing Elasticsearch
Solr
Logsene
(exposes Elasticsearch API)
Logs anomaly detection and alerting ? Logsene
Log search and analytics Kibana
Grafana
Logsene
(Logsene’s own UI or integrated Kibana, or Grafana connected to Logsene via Elasticsearch data source)
Metrics storage and aggregation Graphite
OpenTSDB
KairosDB
Elasticsearch
Influxdb
Prometheus
SPM
Metrics charts and dashboards Grafana
Kibana
SPM
Metrics anomaly detection and alerting Influxdb
Prometheus
SPM
Correlation of Metrics, Logs and Events ? SPM & Logsene integration

This table shows a few things:

  • Some of the functionality provided by SPM and Logsene is not available in some of the most popular open-source monitoring and logging tools included here
  • Some of the SPM and Logsene functionality is indeed provided by some of the open-source tools, however none of them seems to encompass all the features, forcing one to mix and match and head down the tech debt-ridden Franken-monitoring path
  • Try it yourself in the MindMap below – pick a few functionalities and see how many different tools you might have to use?
    docker_monitoring_oss

 

Avoid building technical-debt & Franken-monitoring by using a limited number of Docker monitoring & logging tools

Again, if we missed something, please leave a comment or tweet @sematext.
If you want to try Sematext Docker Agent sign up for a free trial.

P.S.: Sematext Docker Agent is available in the RancherOS Community Catalog and shows up with our new mascot “Octi” only one more pet 🙂 – so if you use RancherOS search for “sematext” in the RancherOS Catalog and within a few clicks you’ll have the Sematext Docker Agent deployed to your RancherOS clusters!

octi-sda

SIGN UP – FREE TRIAL

octi-docker-datacenter

Monitoring Docker Datacenter Logs & Metrics

Docker Datacenter (DDC) simplifies container orchestration and increases the flexibility and scalability of application deployments.  However, the high level of automation create new challenges for monitoring and log management. Organizations that introduce Docker Datacenter manage container deployments in various scenarios e.g., on bare metal, virtual machines, or hybrid clouds. That’s why at Sematext we are seeing a shift from traditional server monitoring to container-centric monitoring. This post is an excerpt from the newly published “Reference Architecture: Monitoring and Logging for Docker Datacenter” and shows how Docker Datacenter could be extended with Logging and Monitoring services.

Download Reference Architecture Logging & Monitoring for Docker Datacenter

The Docker Universal Control Plane (UCP) management functionalities include real-time monitoring of the cluster state, real-time metrics and logs for each container. However, operating larger infrastructures requires a longer retention time for logs and metrics and the capability to correlate metrics, logs and events on several levels (cluster, nodes, applications and containers).  A comprehensive monitoring and logging solution ought to provide the following operational insights:

Read More

docker-monitoring-no-text-1

Container Monitoring: Top Docker Metrics to Watch

Monitoring of Docker environments is challenging. Why? Because each container typically runs a single process, has its own environment, utilizes virtual networks, or has various methods of managing storage. Traditional monitoring solutions take metrics from each server and applications they run. These servers and applications running on them are typically very static, with very long uptimes. Docker deployments are different: a set of containers may run many applications, all sharing the resources of one or more underlying hosts. It’s not uncommon for Docker servers to run thousands of short-term containers (e.g., for batch jobs) while a set of permanent services runs in parallel. Traditional monitoring tools not used to such dynamic environments are not suited for such deployments. On the other hand, some modern monitoring solutions (e.g. SPM from Sematext) were built with such dynamic systems in mind and even have out of the box reporting for docker monitoring. Moreover, container resource sharing calls for stricter enforcement of resource usage limits, an additional issue you must watch carefully. To make appropriate adjustments for resource quotas you need good visibility into any limits containers have reached or errors they have caused. We recommend using alerts according to defined limits; this way you can adjust limits or resource usage even before errors start happening.

Read More

Kafka Consumer Lag Monitoring

SPM is one of the most comprehensive Kafka monitoring solutions, capturing some 200 Kafka metrics, including Kafka Broker, Producer, and Consumer metrics. While lots of those metrics are useful, there is one particular metric everyone wants to monitor – Consumer Lag.

What is Consumer Lag

When people talk about Kafka or about a Kafka cluster, they are typically referring to Kafka Brokers. You can think of a Kafka Broker as a Kafka server. A Broker is what actually stores and serves Kafka messages. Kafka Producers are applications that write messages into Kafka (Brokers). Kafka Consumers are applications that read messages from Kafka (Brokers).

Inside Kafka Brokers data is stored in one or more Topics, and each Topic consists of one or more Partitions. When writing data a Broker actually writes it into a specific Partition. As it writes data it keeps track of the last “write position” in each Partition. This is called Latest Offset also known as Log End Offset. Each Partition has its own independent Latest Offset.

Just like Brokers keep track of their write position in each Partition, each Consumer keeps track of “read position” in each Partition whose data it is consuming. That is, it keeps track of which data it has read. This is known as Consumer Offset. This Consumer Offset is periodically persisted (to ZooKeeper or a special Topic in Kafka itself) so it can survive Consumer crashes or unclean shutdowns and avoid re-consuming too much old data.

Kafka Consumer Lag and Read/Write Rates
Kafka Consumer Lag and Read/Write Rates

In our diagram above we can see yellow bars, which represents the rate at which Brokers are writing messages created by Producers.  The orange bars represent the rate at which Consumers are consuming messages from Brokers. The rates look roughly equal – and they need to be, otherwise the Consumers will fall behind.  However, there is always going to be some delay between the moment a message is written and the moment it is consumed. Reads are always going to be lagging behind writes, and that is what we call Consumer Lag. The Consumer Lag is simply the delta between the Latest Offset and Consumer Offset.

Why is Consumer Lag Important

Many applications today are based on being able to process (near) real-time data. Think about performance monitoring system like SPM or log management service like Logsene. They continuously process infinite streams of near real-time data. If they were to show you metrics or logs with too much delay – if the Consumer Lag were too big – they’d be nearly useless.  This Consumer Lag tells us how far behind each Consumer (Group) is in each Partition.  The smaller the lag the more real-time the data consumption.

Monitoring Read and Write Rates

Kafka Consumer Lag and Broker Offset Changes
Kafka Consumer Lag and Broker Offset Changes

As we just learned the delta between the Latest Offset and the Consumer Offset is what gives us the Consumer Lag.  In the above chart from SPM you may have noticed a few other metrics:

  • Broker Write Rate
  • Consume Rate
  • Broker Earliest Offset Changes

The rate metrics are derived metrics.  If you look at Kafka’s metrics you won’t find them there.  Under the hood SPM collects a few metrics with various offsets from which these rates are computed.  In addition, it charts Broker Earliest Offset Changes, which is  the earliest known offset in each Broker’s Partition.  Put another way, this offset is the offset of the oldest message in a Partition.  While this offset alone may not be super useful, knowing how it’s changing could be handy when things go awry.  Data in Kafka has has a certain TTL (Time To Live) to allow for easy purging of old data.  This purging is performed by Kafka itself.  Every time such purging kicks in the offset of the oldest data changes.  SPM’s Broker Earliest Offset Change surfaces this information for your monitoring pleasure.  This metric gives you an idea how often purges are happening and how many messages they’ve removed each time they ran.

There are several Kafka monitoring tools out there that, like LinkedIn’s Burrow, whose Offset and Consumer Lag monitoring approach is used in SPM.  If you need a good Kafka monitoring solution, give SPM a go.  Ship your Kafka and other logs into Logsene and you’ve got yourself a DevOps solution that will make troubleshooting easy instead of dreadful.

 

5 Minute Recipe: Heroku Log Drain Setup

Since we wrote about how to ship Heroku Logs to ELK we’ve received good feedback from Heroku users and, encouraged by that feedback, deployed a log ingestion service for apps running on Heroku. This makes it super easy to get structured Heroku Logs into Logsene, the hosted ELK logging service.  Let’s see how that’s done in under five minutes (check the current time!):

Step 1 – Create your Logsene App

If you don’t have a Logsene account already simply get a free account and create a Logsene App. This will get you a Logsene Application Token.

Step 2 – Configure Log Drain for your Heroku App

Once you create your Logsene app you’ll see a command to set up the Heroku Log Drain including the Logsene Token.

Simply copy that command and run it in one of the two places:

  1. in the Heroku app directory, like this:

heroku drains:add https://logsene-heroku-receiver.sematext.com/LOGSENE_TOKEN

  1. alternatively, specify your app name in the command instead of calling the command from your Heroku app directory:

heroku drains:add https://logsene-heroku-receiver.sematext.com/LOGSENE_TOKEN -a YOUR_HEROKU_APP_NAME

Step 3 – Watch your Logs in Logsene

If you now access your Heroku App, Heroku should log your HTTP request and a few seconds later the logs will be visible in Logsene.  And not in just any format!  You’ll see PERFECTLY STRUCTURED HEROKU LOGS:

heroku-logs-in-logsene

Parsed Heroku Logs in Logsene

 

Check the time!  Under five minutes?  If you like your Heroku app logs in Logsene tweet us your setup time. 🙂

Sematext Elastic Stack Training

Elasticsearch / Elastic Stack Training – NYC June 13-16

Next month, June 13-16, 2016, we will be running three Elastic Stack (aka ELK Stack) classes in New York City:

  1. June 13 & 14: Elasticsearch for Developers Training Workshop
  2. June 15: Elasticsearch Operations Training Workshop
  3. June 16: Elasticsearch for Logging Training Workshop

All classes cover Elasticsearch 2.x as well as Elasticsearch 5.x!

You can see the complete course outlines under Training Overview.  All three classes include lots of valuable hands-on exercises.  Be prepared to learn a lot!

Cost:

  • 2-day course: $1,200 early bird rate (valid through June 1) and $1,500 afterwards.
  • 1-day course: $700 early bird rate (valid through June 1) and $800 afterwards.

There’s also a 50% discount for the purchase of a 2nd seat!

Location:
462 7th Avenue, New York, NY 10018 – see map

If you have any questions please get in touch.

Sematext Solr Training

Apache Solr Training in NYC June 13-14

If you’ve missed our Core Solr training in October 2015 in New York, here is another chance – we’re running the 2-day Core Solr class again next month – June 13 & 14, 2016.
This course covers Solr 5.x as well as Solr 6.x!  You can see the complete course outline under Solr & Elasticsearch Training Overview . The course is very comprehensive — it includes over 20 chapters and lots of hands-on exercises.  Be prepared to learn a lot!

Cost:
$1,200 early bird rate (valid through June 1) and $1,500 afterwards.

There’s also a 50% discount for the purchase of a 2nd seat!

Location:

462 7th Avenue, New York, NY 10018 – see map

If you have any questions please get in touch.  To sign up, just register here.

reindex data

DocValues Reindexing with Solr Streaming Expressions

Last time, when talking about Solr 6 we learned how to use streaming expressions to automatically update data in a collection. You can imagine this is not the only cool thing you can do with streaming expressions. Today, we will see how to re-index data in your collection for fields that are using doc values. For that we will use Solr 6.1, because of a simple bug that was fixed for that version (details SOLR-9015)

Let’s assume we have two collections – one called video, which will be the source of the data. The second collection will be video_new and will be the target collection. We assume that collections will have slightly different structure – slightly different field names. The video collection will have the following fields:

  • id – document identifier
  • url – URL of the video
  • likes – number of likes
  • views – number of views

The second collection, video_new, will have the following fields:

  • id – document identifier
  • url – URL of the video
  • num_likes – number of likes
  • num_views – number of views

Exporting the data

First thing we need to figure out is a way to export data from the source collection in an efficient fashion. We can’t just set the rows parameter to gazillion, because it is not efficient and can lead to Solr going out of memory. So we will use the /export request handler. The only limitation of that request handler is that data needs to be sorted and needs to use doc values. That is not a problem for our data, however you should be aware of this requirement.

We will start by exporting the data using the standard Solr way – using the request params with the /export handler. The request looks like this:

curl -XGET 'localhost:8983/solr/video/export?q=*:*&sort=id+desc&fl=id,url,likes,views'

The above will result in Solr using the /export handler and returning all data, not only the first page of the results.

However, we want to use streaming expressions to re-index the data. Because of that we can change the above request to use the search streaming expression, which looks as follows:

search(
  video,
  zkHost="localhost:9983",
  qt="/export",
  q="*:*",
  fl="id,url,likes,views",
  sort="id desc")

The working command with the request looks like this:

curl --data-urlencode 'expr=search(video,zkHost="localhost:9983",qt="/export",q="*:*",fl="id,url,likes,views",sort="id desc")' http://localhost:8983/solr/video/stream

We use the search streaming expression and provide the name of the collection, which is video in our case, the ZooKeeper host (yes, we can read from other clusters), the name of the request handler which is /export in our case and is required. Finally, we provide the match-all query, the list of fields that we are interested in, and the sorting expression. Please remember that when using the /export handler all fields listed in the fl parameter must use doc values.

Changing field names

Our collections have different field names and because of that the above search request is not enough. We need to alter the name of the fields by using the select streaming expression. We will change the name of the likes field to num_likes and the name of the views field to num_views. The expression that does that is:

select(
  search(video,zkHost="localhost:9983",qt="/export",q="*:*",fl="id,url,likes,views",sort="id desc"),
  id, 
  url, 
  likes as num_likes,
  views as num_views
)

The select streaming expression lets us choose which fields should be used in the resulting tuples and how they will be named. In our case we take the id and url fields as is and we change the name of the likes and views fields.

To test the result of that expression you can simply use the following command:

curl --data-urlencode 'expr=select(search(video,zkHost="localhost:9983",qt="/export",q="*:*",fl="id,url,likes,views",sort="id desc"),id,url,likes as num_likes,views as num_views)' http://localhost:8983/solr/video/stream

Running the re-indexing

Finally, we have the data prepared and read in an efficient way, so we can send data to Solr for indexation. We do that using the update streaming expression simply by specifying the target collection name and the batch size, like this:

update(
  video_new, 
  batchSize=100, 
  select(search(video,zkHost="localhost:9983",qt="/export",q="*:*",fl="id,url,likes,views",sort="id desc"),id,url,likes as num_likes,views as num_views))

And the command that we would send to Solr:

curl --data-urlencode 'expr=update(video_new,batchSize=100,select(search(video,zkHost="localhost:9983",qt="/export",q="*:*",fl="id,url,likes,views",sort="id desc"),id,url,likes as num_likes,views as num_views))' http://localhost:8983/solr/video/stream

Please note that we send the command to the source collection /stream handler – in our case to the video collection. This is important.

Verifying the re-indexation

Once the task has been finished by Solr we can check the number of documents returned by each collection to verify that data has been re-indexed properly. We can do that by running these commands:

curl -XGET 'localhost:8983/solr/video/select?q=*:*&indent=true&rows=0'

and

curl -XGET 'localhost:8983/solr/video_new/select?q=*:*&indent=true&rows=0'

Both result in the following number of documents:

<?xml version="1.0" encoding="UTF-8"?>
<response>

<lst name="responseHeader">
  <bool name="zkConnected">true</bool>
  <int name="status">0</int>
  <int name="QTime">38</int>
  <lst name="params">
    <str name="q">*:*</str>
    <str name="indent">true</str>
    <str name="rows">0</str>
  </lst>
</lst>
<result name="response" numFound="18" start="0">
</result>
</response>

And that means that everything works as intended 🙂

Interested in Solr Streaming Expressions? Subscribe to this blog or follow @sematext – we have more Streaming Expressions blog posts in the queue. If you need any help with Solr / SolrCloud – don’t forget @sematext does Solr Consulting, Production Support, as well as Solr Training!