UPDATE: Related to monitoring CoreOS clusters, we have recently optimized the SPM setup on CoreOS and integrated a logging gateway to Logsene into the SPM Agent for Docker. You can read about it in Centralized Log Management and Monitoring for CoreOS Clusters
[ Note: Click here for the Docker Monitoring webinar video recording and slides. And click here for the Docker Logging webinar video recording and slides. ]
In this post you’ll learn how to get operational insights (i.e. performance metrics, container events, etc.) from CoreOS and make that super simple with etcd, fleet, and SPM.
We’ll use:
- SPM for Docker to run the monitoring agent as a Docker container and collect all Docker metrics and events for all other containers on the same host + metrics for hosts
- fleet to seamlessly distribute this container to all hosts in the CoreOS cluster by simply providing it with a fleet unit file shown below
- etcd to set a property to hold the SPM App token for the whole cluster
The Big Picture
Before we get started, let’s take a step back and look at our end goal. What do we want? We want charts with Performance Metrics, we want Event Collection, we’d love integrated Anomaly Detection and Alerting, and we want that not only for containers, but also for hosts running containers. CoreOS has no package manager and deploys services in containers, so we want to run the SPM agent in a Docker container, as shown in the following figure:
By the end of this post each of your Docker hosts could look like the above figure, with one or more of your own containers running your own apps, and a single SPM Docker Agent container that monitors all your containers and the underlying hosts.
3 Simple Steps
1) Create a new SPM App of type “Docker” and copy the SPM App Token
2) Set the SPM App Token via etcd. This makes the token instantly available to all SPM agent instances in the cluster:
etcdctl set /sematext.com/myapp/spm/token YOUR_SPM_APP_TOKEN
Of course, you can change “myapp” part to whatever you want. This simply acts as a namespace in etcd in case you have multiple SPM Apps (and thus multiple SPM App Tokens).
3) Grab the spm-agent.service fleet unit file and submit it to fleet:
# download service file for sematext-agent-docker wget https://raw.githubusercontent.com/sematext/sematext-agent-docker/master/coreos/spm-agent.service # Load and start the service with fleetctl load spm-agent.service fleetctl start spm-agent.service
Fleet unit file
What’s this fleet unit file about? It simple. It reads the SPM App Token from etcd and then starts the Docker container with sematext-agent-docker inside. This is what it looks like:
[Unit] Description=SPM Docker Agent After=docker.service Requires=docker.service [Service] TimeoutStartSec=0 EnvironmentFile=/etc/environment Restart=always RestartSec=30s ExecStartPre=-/usr/bin/docker kill spm-agent ExecStartPre=-/usr/bin/docker rm spm-agent ExecStartPre=/usr/bin/docker pull sematext/sematext-agent-docker:latest ExecStart=/bin/sh -c 'set -ex; /usr/bin/docker run --name sematext-agent -e SPM_TOKEN=$(etcdctl get /sematext.com/myapp/spm/token) -e HOSTNAME=$HOSTNAME -v /var/run/docker.sock:/var/run/docker.sock sematext/sematext-agent-docker' ExecStop=/usr/bin/docker stop spm-agent [Install] WantedBy=multi-user.target [X-Fleet] Global=true
After about a minute, you should see Docker metrics and events in SPM.
Open Sourced Everything
Everything described here is open-sourced:
- Documentation, source code and support information is available here: https://github.com/sematext/sematext-agent-docker
- The CoreOS directory contains additional examples, such as a cloud-config file enabling SPM and Logsene service for logging journalctl output to Logsene ELK Stack.
Summary – What this gets you
What we get after this setup is the following:
- Operating System metrics of each CoreOS cluster node
- Container Metrics for all containers on each node
- Docker Events from all nodes
- CoreOS logs from all nodes
Having this little setup let’s you take the full advantage of SPM and Logsene e.g. by defining intelligent alerts for metrics and logs, delivered to channels like e-mail, PagerDuty, Slack, HipChat or any WebHook, as well as making correlations between performance metrics, events, logs, and alerts.
Running CoreOS? Need any help getting CoreOS metrics and/or logs into SPM & Logsene? Let us know! Oh, and if you’re a small startup — ping @sematext — you can get a good discount on both SPM and Logsene!