Centralized Log Management and Monitoring for CoreOS Clusters

 Note: Click here for the Docker Monitoring webinar video recording and slides. And click here for the Docker Logging webinar video recording and slides.

SPM Agent for Docker was renamed  “sematext/sematext-agent-docker” on Docker Hub (see Sematext joins Docker ETP program for Logging).  The latest CoreOS service files and instructions are available in the new Github Repository.

——-

If you’ve got an interest in things like CoreOS, logs and monitoring then you should check out our previous CoreOS-related posts on Monitoring Core OS Clusters and how to get CoreOS logs into ELK in 5 minutes.  And they are only the start of SPM integrations with CoreOS!  Case in point: we have recently optimized the SPM setup on CoreOS and integrated a logging gateway to Logsene into the SPM Agent for Docker.  And that’s not all…

In this post we want to share the current state of CoreOS Monitoring and Log Management from Sematext so you know what’s coming — and you know about things that might be helpful for your organization, such as:

  1. Feature Overview
  2. Fleet Units for SPM
  3. How to Set Up Monitoring and Logging Services

1. Feature Overview

  • Quick setup
    • add monitoring and logging for the whole cluster in 5 minutes
  • Collection Performance Metrics for the CoreOS Cluster
    • Metrics for all CoreOS cluster nodes (hosts)
      • CPU, Memory, Disk usage
    • Detailed metrics for all containers on each host
      • CPU, Memory, Limits, Failures, Network and Disk I/O, …
    • Anomaly detection and alerts for all metrics
    • Anomaly detection and alerts for all logs
  • Correlated Container Events, Metrics and Logs
    • Docker Events like start/stop/destroy are related to deployments, maintenance or sometimes to errors and unwanted restarts;  correlation of metrics, events and logs is the natural way to discover problems using SPM.

Docker Events

  • Centralized configuration via etcd
    • There is often a mix of configurations in environment variables, static settings in cloud configuration files, and combinations of confd and etcd. We decided to have all settings stored in etcd, so the settings are done only once and are easy to access.
  • Automatic Log Collection
    • Logging gateway Integrated into SPM Agent
      • SPM Agent for Docker includes a logging gateway service to receive log message via TCP.  The service discovery is solved via etcd (where the exposed TCP is stored). All received messages are parsed, and the following formats are supported:
        • journalctl -o short | short-iso | json
        • integrated messages parser (e.g. for dockerd time, level and message text)
        • line delimited JSON
        • plain text messages
        • In cases where the parsing fails, the gateway adds a timestamp and keeps the message 1:1.
      • The logging gateway can be configured with the Logsene App Token – this makes it compatible with most Unix tools e.g. journalctl -o json -n 10 | netcat localhost 9000
      • SPM for Docker collects all logs from containers directly from the Docker API. The logging gateway is typically used for system logs – or anything else configured in journald (see “Log forwarding service” below)
      • The transmission to Logsene receivers is encrypted via HTTPS.
    • Log forwarding service
      • The log forwarding service streams logs to the logging gateway by pulling them from journald. In addition, it saves the ‘last log time’ to recover after a service restart. Most people take this for granted; but not all logging services have such a recovery function.  There are many tools which just capture the current log stream. Often people realize this only when they miss logs one day because of a reboot, network outage, software update, etc.  But these are exactly the types of situations where you would like to know what is going on!
SPM integrations into CoreOS
SPM integrations into CoreOS

2. Fleet Units for SPM

SPM agent services are installed via fleet (a distributed init system) in the whole cluster. Lets see those unit files before we fire them up into the Cloud.

The first unit file sematext-agent.service starts SPM Agent for Docker. It takes the SPM and Logsene app tokens and port for the logging gateway etcd. It starts on every CoreOS host (global unit).

spm-agent.service
Fleet Unit File – SPM Agent incl. Log Gateway: spm-agent.service

The second unit file logsene-service.service forwards logs from journald to that logging gateway running as part of sematext-agent-docker. All fields stored in the journal (down to source-code level and line numbers provided by GO modules) are then available in Logsene.

logsene-service
Fleet Unit File – Log forwarder: logsene.service

3. Set Up Monitoring and Logging Services

Preparation:

  1. Get a free account apps.sematext.com
  2. Create an SPM App of type “Docker” and copy the SPM Application Token
  3. Store the configuration in etcd
# PREPARATION
# set your application tokens for SPM and Logsene
export $SPM_TOKEN=YOUR-SPM-TOKEN
export $LOGSENE_TOKEN=YOUR-LOGSENE-TOKEN
# set the port for the Logsene Gateway
export $LG_PORT=9000
# Store the tokens in etcd
# please note the same key is used in the unit file!
etcdctl set /sematext.com/myapp/spm/token $SPM_TOKEN
etcdctl set /sematext.com/myapp/logsene/token $LOGSENE_TOKEN
etcdctl set /sematext.com/myapp/logsene/gateway_port $LG_PORT
 

Download the fleet unit files and start the service via fleetclt

# INSTALLATION
# Download the unit file for SPM
wget https://raw.githubusercontent.com/sematext/sematext-agent-docker/master/coreos/sematext-agent.service
# Start SPM Agent in the whole cluster
fleetctl load spm-agent.service; fleetctl start spm-agent.service
# Download the unit file for Logsene
wget https://raw.githubusercontent.com/sematext/sematext-agent-docker/master/coreos/logsene.service
# Start the log forwarding service
fleetctl load logsene.service; fleetctl start logsene.service

Check the installation

systemctl status sematext-agent.service
systemctl status logsene.service

Send a few log lines to see them in Logsene.

journalctl -o json -n 10 | ncat localhost 9000

After about a minute you should see Metrics in SPM and Logs in Logsene.

Core-OS-BEV
Cluster Health in ‘Birds Eye View’
docker-overview-2
Host and Container Metrics Overview for the whole cluster
logs
Logs and Metrics

Open-Source Resources

Some of the things described here are open-sourced:

Summary – What this gets you

Here’s what this setup provides for you:

  • Operating System metrics of each CoreOS cluster node
  • Container and Host Metrics on each node
  • All Logs from Docker containers and Hosts (via journald)
  • Docker Events from all nodes
  • CoreOS logs from all nodes

Having this setup allows you to take the full advantage of SPM and Logsene by defining intelligent alerts for metrics and logs (delivered via channels like e-mail, PagerDuty, Slack, HipChat or any WebHook), as well as making correlations between performance metrics, events, logs, and alerts.

Running CoreOS? Need any help getting CoreOS metrics and/or logs into SPM & Logsene?  Let us know!  Oh, and if you’re a small startup — ping @sematext — you can get a good discount on both SPM and Logsene!

Leave a Reply