Skip to main content
Monitoring

Docker Container Monitoring with Sematext

Stefan Thies Stefan Thies on

Everyone’s infrastructure is growing – today mostly in the container space. As we learned in Part 1 of this series – Docker Container Monitoring and Management Challenges, monitoring for containers is different from traditional server monitoring. In Part 2 we had a glance at key container metrics and in Part 3 we compared several open source tools for container monitoring.

Open-source monitoring tools are free, but your time is not.  Relatively speaking, it’s actually rather expensive.  Thus, Sematext aims to save you time, effort… and the hair you might lose in the process.

Here are a few things you will NOT have to do when using Sematext for container monitoring:

  • figure out which metrics to collect and which ones to ignore
  • give metrics meaningful labels
  • hunt for metric descriptions in the docs so that you know what each of them actually shows
  • build charts to group metrics that you really want on the same charts, not N separate charts
  • figure out, for each metric, which aggregation to use (min? max? avg? something else?)
  • build dashboards to combine charts with metrics you typically want to see together
  • set up basic alert rules

The above is not even a complete story.  Do you want to collect container logs?  Want to structure them?  Prepare to do more legwork.

In this post, we will look at how Sematext provides more comprehensive – and easy to set up – monitoring for containers in your infrastructure by combining events, logs, and metrics together in one integrated full stack observability platform. By using Sematext Agent, you can monitor your whole infrastructure and apps, not just your containers. You can also get deeper visibility into your full stack by collecting, processing and shipping your Logs with Logagent and analyzing metrics, events and logs together in Sematext Cloud. Let’s explore each of them.

Container Logs

Most non-containerized applications write logs into files, while containers write their logs to the standard output and standard error stream (console). Thus, container logs are console output streams from containers. This data might be a mix of plain text messages from start scripts and structured logs from applications.  The problem is obvious – you can’t just take a stream of log events all mixed up and treat them like a blob.  You need to be able to tell which log event belongs to what container, what app, parse it correctly, etc. Docker log drivers help to ship logs to log management servers, however, most logging drivers don’t help with parsing container logs. Therefore most logging solutions use separate tools like Logstash or rsyslog to structure logs before forwarding them to log storage. When we use multiple different components with dependencies for log processing, the chance that the logging pipeline will break increases with each new component. There are a few Docker Log Driver Alternatives. One of those alternatives is Logagent – an all-in-one solution for container log processing.

Logagent is a general purpose log shipper. The Logagent Docker image is pre-configured for log collection on container platforms. It runs as a tiny container on every host and collects logs for all cluster nodes and their containers. All container logs are enriched with Kubernetes, Docker Enterprise, and Docker Swarm metadata.

A rich set of configurations options for logs is available and Logagent provides the option to use all input plugins, filter plugins, and output-plugins. The format for log parser patterns remains the same. Logagent recognizes log formats from various applications / official container images out of the box. Custom log parser patterns can be specified to process any kind of log formats.

Log format detection and intelligent pattern matching:

  • Pattern library included covering a set of common databases, web servers, message queues, etc.
    • Apache Kafka
    • Apache Solr
    • Apache Zookeeper
    • Apache web server
    • MySQL
    • ClickHouse
    • CouchDB
    • Cassandra
    • MongoDB
    • Elasticsearch
    • nsq.io
    • Redis
    • Nginx
    • RabbitMQ
    • PostgreSQL
    • HBase HDFS Data Node
    • HBase Region Server
    • Hadoop YARN Node Manager
    • Traefik
  • Parsing of JSON logs
  • Easy to extend with custom patterns and JS transform functions
  • Hot reload of changed pattern definitions without service/container restart
  • Auto-detection of date and numeric fields
  • Masking of sensitive data with a configurable hashing algorithm (SHA-1, SHA-256, SHA-512, …)
  • GeoIP lookup with automatic GeoIP DB updates (Maxmind GeoIP-Lite files)

Logagent collects not only local container logs. Logagent can also collect logs from all cluster nodes by mounting log file directories into the Logagent container.

Getting started with Logagent

To run Logagent you will need a Logs App token. If you don’t have any Logs Apps yet, you can create one now.
The Sematext UI displays copy and paste instructions for running Logagent, which you can see below.

create logs app

See more in Logagent on Docker docs, which also show how to run Logagent in Kubernetes and OpenShift, Docker Swarm and Docker Enterprise, Apache Mesos, and more.

Setup Logagent to collect container and host logs

We start with a simple deployment on a single Docker host to explain the usage of Logagent. The most basic start method is using docker run command:

docker pull sematext/logagent
docker run -d --restart=always --name st-logagent
-e LOGS_TOKEN=YOUR_LOGS_TOKEN
-e LOGS_RECEIVER_URL="https://logsene-receiver.sematext.com"
-v /var/run/docker.sock:/var/run/docker.sock
sematext/logagent

We see here three mandatory parameters:

  • LOGS_RECEIVER_URL: The URL of your Elasticsearch endpoint (defaults to Sematext Cloud US https://logsene-receiver.sematext.com). For Sematext Europe use https://logsene-receiver.eu.sematext.com.

  • LOGS_TOKEN: The logs token of your Sematext logs App.

  • Binding to Docker engine socket (-v /var/run/docker.sock:…) to retrieve logs via the Docker API.

Once you run the command above, all container logs will show up in Sematext UI.

nginx logs container

Nginx web server logs captured, parsed and enriched by Logagent

Tail server log files from docker hosts

To implement tailing log files from the Docker host filesystem we need to mount the log directory /var/log from the host to /host/log in the Logagent container and specify a filename pattern (GLOB pattern) to activate Logagent’s file watcher. To scan all subdirectories for log files, we can use the wildcard e.g. LOG_GLOB=/host/log/**/*.log:

docker run -d --restart=always --name st-logagent
-e LOGS_TOKEN=YOUR_LOGS_TOKEN
-e LOGS_RECEIVER_URL="https://logsene-receiver.sematext.com"
-e LOG_GLOB=/host/logs/**/*.log
-v /var/run/docker.sock:/var/run/docker.sock
-v /var/log:/host/log
sematext/logagent

You will see logs from containers and the host in the Sematext Logs App. Host and container logs get tagged with a “logSource” field holding the log filename for hosts and image name and container name for containers.

server logs

Add metadata from container labels

Container logs processed by Logagent get enriched with container metadata such as container name, image name and container labels. The tagging of log events with container labels or environment variables is configurable. The default value for TAGGING_LABELS is configured to collect all Docker labels, container environment variables, Kubernetes labels, and Kubernetes annotations:

TAGGING_LABELS=com.docker.*,io.kubernetes.*,annotation.io.*

JSON view of a container log entry

A JSON view of a log entry, showing all labels and Kubernetes metadata

To configure tagging labels simply set your patterns, matching the label name or environment variable name, e.g.:

TAGGING_LABELS=myapp.*,ROLE,USER,com.docker.*,io.kubernetes.*,annotation.io.*

Skip verbose log messages

If you run health checks for your containers, you might see every health check in the log output. The information is not very valuable and consumes log storage and network resources.

Setting IGNORE_LOGS_PATTERN=/healthcheck|/ping will remove the noisy health check URLs in your server logs.

We will stop here, move on to more examples for Docker Enterprise, Swarm and Kubernetes. Feel free to explore the documentation addressing more topics like blacklisting/whitelisting of container logs or log routing.

Logagent on Docker Enterprise with Swarm

The setup with “docker run” on a Docker host is handy for development environments. In production environments we want to automate the deployment of Logagent. Unlike service in your application stack, we need to run Logagent on every cluster node to ensure log collection works for all hosts and containers in the cluster.  Therefore we deploy Logagent as global Swarm service. The service will automatically install Logagent on new nodes when you scale the Docker cluster.  The “docker service” command is similar to the docker run command:

docker service create --restart=always -mode global -name st-logagent

-mount type=bind,src=/var/run/docker.sock,dst=/var/run/docker.sock
-mount type=bind,src=/var/log,dst=/host/log
-e LOGS_TOKEN="YOUR LOGS TOKEN HERE"
-e LOGS_RECEIVER_URL="https://logsene-receiver.sematext.com"
-e LOG_GLOB="/host/log/**/*.log"
sematext/logagent

We can check the service status with docker service ps st-logagent. We should see that Swarm deployed Logagent to all nodes.

Because Logagent adds rich metadata, including host and IP address fields, to all logs  it is easy to track logs by a host, swarm service, or container.

Docker Enterprise doesn’t support only Swarm mode. We can also use Kubernetes deployments with Docker Enterprise. Let’s explore the Kubernetes deployment of Logagent next.

Log Search and Dashboards

Once you have container logs in Sematext you can search them when troubleshooting, save queries you run frequently or create your individual logs dashboard to have better insights into information in your logs.

search for container logs

Search for container logs

Log Search Syntax

If you know how to search with Google, you’ll know how to search your logs in Sematext Cloud.

  • Use AND, OR, NOT operators – e.g. (error OR warn) NOT exception
  • Use explicitly field references – e.g. message:timeout
  • Group AND, OR, NOT clauses – e.g. message:(exception OR error OR timeout) AND severity:(error OR warn)
  • Don’t like Booleans? Use + and – to include and exclude – e.g. +message:error -message:timeout -host:db1.example.com
  • Need a phrase search? Use quotation marks – e.g. message:”fatal error”

When digging through logs you might find yourself running the same searches again and again.  To solve this annoyance you can save queries so you can re-execute them quickly without having to retype them. Please watch how using logs for troubleshooting simplifies your work.

Alerting on Container Logs

To create an alert on logs we start by running a query that matches exactly those log events that we want to be alerted about. To create an alert just click to the floppy disk icon.

pasted image 0 3

Similar to the setup of metric alert rules, we can define threshold-based or anomaly detection alerts based on the number of matching log events the alert query returns.

Screen Shot 2019 02 13 at 12.32.04

Please watch Alerts in Sematext Cloud for more details.

Container Metrics and Log Correlation

A typical troubleshooting workflow starts from detecting a slowness through metrics, then digging into logs to find the root cause of the problem. Sematext makes this really simple and fast.  Your metrics and logs live under one roof.  Logs are centralized, the search is fast, and powerful log search syntax is simple to use.  Correlation of metrics and logs is literally a click away.

Container logs and metrics in a single view

Container logs and metrics in a single view

Container Monitoring

Sematext Agent collects metrics about hosts (CPU, memory, disk, network, processes), containers and orchestrator platforms and ships that to Sematext Cloud. To gain deep insight into the Linux kernel, Sematext Agent relies on eBPF to implant instrumentation points (attach eBPF programs to kprobes) on kernel functions. Using Linux kernel instrumentation allows Sematext Agent a very efficient and powerful system exploration approach. Network tracing uses Linux kernel-level eBPF to collect information about network connections being made and established.  Unlike traditional pcap-based network monitoring, the eBPF approach incurs negligible overhead.

Sematext Agent can auto-discover services deployed on physical/virtual hosts and containers.  It also collects data about your infrastructure to provide you with infrastructure inventory reports. It collects events from different sources such as OOM notifications, container or Kubernetes events.

Information collected by Sematext Agent:

  • Container runtime agnostic discovery and monitoring
    • Containers are discovered from cgroupfs hierarchies
    • Supports Docker and Rkt container engines
  • Container metrics fetched directly from cgroupfs
    • CPU usage
    • Disk space usage and IO stats
    • Memory usage, memory limits, and memory fail counters
    • Network IO stats
  • Collection of host inventory information
    • Host kernel version/system information
    • Information about installed software packages
  • Collection of container metadata
    • Container name
    • Image name
    • Container networks
    • Container volumes
    • Container environment
    • Container labels including relevant information about orchestration
    • Kubernetes metadata such as Pod name, UUID, Namespace
    • Docker Swarm metadata such as Service name, Swarm Task etc.
  • Collection of container events
  • Docker events such as start/stop/die/volume mount, etc.
  • Kubernetes events such as Pod status changes deployed, destroyed etc.
  • Tracking deployment status and Pod restarts over time

That is a lot of information and Sematext organizes this information in reports for infrastructure monitoring, container monitoring, and Kubernetes cluster monitoring.

Docker Engine and Kubernetes aware cluster agent

The Sematext agent is fully Docker Engine and Kubernetes-aware. It collects Kubernetes metrics in the most optimal fashion possible. When deployed to worker nodes, Sematext agent relies on Kubernetes leader election mechanism to elect one instance of the agent to act as the leader, thus minimizing the agent impact. Such cluster agent leader collects cluster level metrics (deployment, pod, stateful set stats) and Kubernetes events, while other agent instances are in charge of gathering kubelet-specific metrics, as well as container metrics for workloads collocated on the node where the agent is running. Kubernetes clusters running lots of nodes will see the most benefit from this newly optimized Sematext agent. Sematext agent gathers information about processes inside and outside containers, thus providing data for the new Sematext process monitoring functionality.

Let’s see how Sematext Agent is deployed.

Getting started with Sematext Agent

To run Sematext Agent you will need a Docker App token. If you don’t have any Docker App yet, you can create one now. The Sematext UI displays copy and paste instructions for various ways of deployments for Docker, Docker Enterprise/Swarm, Kubernetes Daemonsets or Helm charts.

ezgif 4 9c0c90a37622

The Sematext Agent Documentation contains all configuration options. After a short time, you will see container information in the infrastructure monitoring, Docker- and Kubernetes reports.

Setup Sematext Agent on Docker

A basic Docker setup for Sematext Agent shows the mandatory options. First, the agent mounts system directories to collect all relevant information.  The tokens are used to ship the metrics to the right Sematext App. The JOURNAL_DIR is used to buffer metrics locally in case of any error during metric transmission. Then we set logging options to configure the log output. For troubleshooting, the logging options can be modified to get a verbose output of the agent activity. The parameter CONTAINER_SKIP_BY_IMAGE is used to exclude containers from being monitored.

docker run -d  --restart always --privileged -P --name st-agent \
-v /:/hostfs:ro \
-v /sys/:/hostfs/sys:ro \
-v /var/run/:/var/run/ \
-v /sys/kernel/debug:/sys/kernel/debug \
-v /etc/passwd:/etc/passwd:ro \
-v /etc/group:/etc/group:ro \
-e INFRA_TOKEN=YOUR_INFRA_TOKEN \
-e CONTAINER_TOKEN=YOUR_DOCKER_APP_TOKEN \
-e NODE_NAME="`hostname`" \
-e REGION=US \
sematext/agent:latest

Setup Sematext Agent on Docker Enterprise with Swarm

The setup with “docker run”  on a Docker host is handy for development environments. In production environments, we want to automate the deployment of Sematext Agent. Unlike service in your application stack, we need to run Sematext Agent on every cluster node to ensure log collection works for all hosts and containers in the cluster.  Therefore we deploy Sematext Agent as global swarm service. The service will automatically install the agent on new nodes when you scale the Docker cluster.  The “docker service” command is similar to the docker run command:

docker service create --mode global --name st-agent \
--restart-condition any \
--mount type=bind,src=/,dst=/hostfs,readonly \
--mount type=bind,src=/etc/passwd,dst=/etc/passwd,readonly \
--mount type=bind,src=/etc/group,dst=/etc/group,readonly \
--mount type=bind,src=/var/run,dst=/var/run/ \
--mount type=bind,src=/sys/kernel/debug,dst=/sys/kernel/debug \
--mount type=bind,src=/sys,dst=/host/sys,readonly \
-e INFRA_TOKEN=YOUR_INFRA_TOKEN \
-e CONTAINER_TOKEN=YOUR_DOCKER_APP_TOKEN \
-e NODE_NAME="{{.Node.Hostname}}" \
-e REGION=US \
sematext/agent:latest

Container Infrastructure Dashboard

Behind the scenes, Sematext Cloud automatically tags all containers with their Docker hostnames. Therefore it is easy to see where a container is running. You can use these tags for filtering and slice and dice your containers’ metrics as you see fit.  Grouping by tags lets you quickly drill down to specific containers to see what’s happening within each one, should you need that level of detail. Beyond that, the automatic generation of dynamic container performance heat maps will instantly lead you to the “hottest” containers in your entire infrastructure. The containers view discovers all your containers over all nodes in all your Kubernetes, OpenShift, Docker Swarm clusters or any other container platform supported by Sematext Docker Agent, such as Amazon ECS, Mesos, Rancher, or Portainer.
The main view is very much like “top for containers” – applied across all of your containers and their hosts. You can sort containers by memory or CPU usage to find the hottest containers, or simply display the Top N containers. The detail view of each container displays container metrics in real-time. For more massive infrastructures, you can quickly drill down by using the filter and grouping functionality. Grouping by hosts or images turns the list view into a hierarchical view.

The UI shows a heatmap with the hottest containers according to the grouping criteria.

server monitoring 2

Container monitoring with heatmap

In Sematext Cloud you will also find historical metrics, logs, container events, and a lot more.  All these observability data are seamlessly integrated so you can access it with just a few mouse clicks, starting from the new container view, container reports, and combined metrics & log dashboards.

Screen Shot 2018 07 16 at 16.38.26 1

Container metrics, logs, and events in a single view

Container Metrics Dashboard

Your Docker App shows container metrics grouped into several individual container monitoring dashboards:

  • Overview: CPU, memory, network and Kubernetes key metrics
  • Container count: number of running containers grouped by nodes and images
  • Container CPU: CPU usage of containers
  • Container Disk: IO throughput, IO wait times
  • Container Memory: memory usage, memory fail counters, memory paging activity, page faults, …
  • Container Network: send and receive rates, network packets, network error rates, …

Container Metrics in Sematext Cloud

Container Metrics in Sematext Cloud

Setup Container Alerts

To save you time Sematext automatically creates a set of default alert rules such as alerts for low disk space. You can create additional alerts on any metric. Watch Alerts in Sematext Cloud for more details.

Alerting on Container Metrics

There are 3 types of alerts in Sematext:

  • Heartbeat alerts, which notify you when a ClickHouse DB server is down
  • Classic threshold-based alerts that notify you when a metric value crosses a predefined threshold
  • Alerts based on statistical anomaly detection that notify you when metric values suddenly change and deviate from the baseline

Let’s see how to actually create some alert rules for container metrics in the animation below. The pod restart chart below shows a growing number of restarts. We normally have no pod restarts, but we see it can jump to over 400 restarts caused by a failing Kubernetes cronjob. To create an alert rule on a metric go to the pulldown in the top right corner of a chart and choose “Create alert”. The alert rule applies the filters from the current view and you can choose various notification options such as email or configured notification hooks (PagerDuty, Slack, VictorOps, BigPanda, OpsGenie, Pusher, generic webhooks etc.). Alerts are triggered either by anomaly detection, watching metric changes in a given time window or through the use of classic threshold-based alerts.

Alerting on container metrics –  pod restart count

Monitor Container Events

Container Events are very valuable for monitoring. Events reflect change in your infrastructure – from node restarts to container deployments or changes on running containers. You can track every docker command, which is very valuable not only for configuration changes.  Events are also a good source for security audits. Sematext agent collects events from the docker engine and Kubernetes API. Whenever something goes wrong in your container stack – you can correlate logs or metrics with the time of container events!

Examples:

  1. Your HTTP API server was not reachable and a load balancer had HTTP 500 server errors. When you check then container events, you might see that the API server container was restarted at the same time – root cause found with a few mouse clicks!
  2. One of your applications in a container behaves differently from others. You might find a Docker “exec”, “commit” and “restart” events indicating that somebody modified one container at runtime! If you are not hacked, one of the developers might have done a “hotfix” and forgot to fix the problem in the application image.

We could continue here with an endless list of examples …  once you get used to monitoring container events for security audits or troubleshooting you will never want to miss it again!

Docker Lifecycle Events / source

Docker Lifecycle Events / source [1]

Check the list of container events:

  • Docker containers trigger the following events:
    • Lifecycle events:
      • Create – when a container is created
      • Start – when a container starts
      • Restart – when a container gets restarted
      • Stop – when a container stops
      • Oom – when a container runs out of memory
      • Pause – when a container gets paused
      • Unpause – when a container continues to run after a pause
      • Die – when the main process in a container dies
      • Kill – when the container gets killed
      • Destroy – when a container gets destroyed
    • Runtime events
      • Commit – when changes to the container filesystem are committed. Modifying deployed containers in production is not a common practice, therefore the commit could indicate a “hack” and should be watched carefully.
      • Copy – when files are copied from/to a container. Could indicate a potential data leak.
      • Attach – when a process connects to container console – somebody is reading your container logs …
      • Detach – when a process disconnects from container console streams
      • Exec – when a command is executed in container console, very helpful to investigate in potential hacker attacks
      • Export – when a container gets exported
      • Health_status – when health_status is checked
      • Rename – when a container gets renamed
      • Resize – when a container gets resized
      • Top – when somebody list top processes in a container
      • Update – when a container is updated e.g. with new labels
  • Docker images report the following events:
    • Delete – when an image gets deleted
    • Import – when an image gets imported
    • Load – when an image is loaded
    • Pull – when an image is pulled from a registry
    • Push – when an image is pushed to a registry
    • Save – when an image is saved
    • Tag – when an image is tagged with labels
    • Untag – when an image tag is removed
  • Docker plugins report the following events:
    • Enable – when a plugin gets enabled
    • Disable – when a plugin gets disabled
    • Install – when a plugin gets installed
    • Remove – when a plugin gets removed
  • Docker volumes report the following events:
    • Create – when a volume is created
    • Destroy – when a volume gets destroyed
    • Mount – when a volume is mounted to a container
    • Unmount – when a volume is removed from a container
  • Docker networks report the following events:
    • Create – when a  network is created
    • Connect – when a container connects to a network
    • Remove – when the network is removed
    • Destroy – when a network is destroyed
    • Disconnect – when a container disconnects from a network
  • Docker daemons report the following events:
    • Reload
  • Docker services, nodes, secrets, and configs report the following events:
    • Create – on the creation of a resource
    • Remove – on the removal of a resource
    • Update – on the creation of a resource

Sematext shows all events in a chart and helps you search for container events. While the event view exists as a standalone view, you can also add it to any other dashboard in Sematext with a simple switch to correlate events, logs, and metrics.

container events in sematext

Docker Events in Sematext

Summary

Comprehensive monitoring for containers involves identifying key metrics for the cluster nodes, orchestration layer, and containers, collecting metrics, logs and events, and connecting everything in a meaningful way. In this post, we’ve shown you how to monitor containers metrics and logs in one place. We used out of the box and customized dashboards, metrics correlation, log correlation, anomaly detection, and alerts. Using open-source monitoring integrations, you can easily start monitoring containers alongside metrics, logs, and distributed request traces from all of the other technologies in your infrastructure. Get deeper visibility into containers today with a free Sematext trial.

References:

[1] http://docker-saigon.github.io/post/Docker-Internals/

Leave a Reply