2017 is almost here and, like last year, we thought we’d share how 2016 went for us. We remain committed to be your “one-stop shop” for all things Elasticsearch and Solr: from Consulting, Production Support, and Training, to complementing that with our Logsene for all your logs, and SPM for all your monitoring needs.
It’s safe to say 2016 was the year of Docker and by extension Kubernetes, Mesos, Docker Swarm, among others, too. They stopped being just early adopters’ toys and have become production-ready technologies used by many. This year we’ve added excellent support for Docker monitoring with SPM and logging with Logsene via the open-source Sematext Docker Agent.
Running on Elasticsearch on Docker sounds like a natural fit – both technologies promise elasticity. However, running a truly elastic Elasticsearch cluster on Docker Swarm became somewhat difficult with Docker 1.12 in Swarm mode. Why? Since Elasticsearch gave up on multicast discovery (by moving multicast node discovery into a plugin and not including it by default) one has to specify IP addresses of all master nodes to join the cluster. Unfortunately, this creates the chicken or the egg problem in the sense that these IP addresses are not actually known in advance when you start Elasticsearch as a Swarm service! It would be easy if we could use the shared Docker bridge or host network and simply specify the Docker host IP addresses, as we are used to it with the “docker run” command. However, “docker service create” rejects the usage of bridge or host network. Thus, the question remains: How can we deploy Elasticsearch in a Docker Swarm cluster?
This is a guest post by Prof. Chanwit Kaewkasi, Docker Captain who organized Swarm3K – the largest Docker Swarm cluster to date.
Swarm3K was the second collaborative project trying to form a very large Docker cluster with the Swarm mode. It happened on 28th October 2016 with more than 50 individuals and companies joining this project.
Sematext was one of the very first companies that offered to help us by offering their Docker monitoring and logging solution. They became the official monitoring system for Swarm3K. Stefan, Otis and their team provided wonderful support for us from the very beginning.
To my knowledge, Sematext is one and the only Docker monitoring company which allow to deploy the monitoring agents as the global Docker service at the moment. This deployment model provides for a greatly simplified the monitoring process.
Swarm3K Setup and Workload
There were two planned workloads:
MySQL with WordPress cluster
The 25 nodes formed a MySQL cluster. We experiences some mixing of IP addresses from both mynet and ingress networks. This was the same issue found when forming a cluster of Apache Spark in the past (see https://github.com/docker/docker/issues/24637). We prevented this by binding the cluster only to a single overlay network.
A WordPress node was scheduled somewhere on our huge cluster, and we intentionally didn’t control where it should be. When we were trying to connect a WordPress node to the backend MySQL cluster, the connection kept timing out. We concluded that a WordPress / MySQL combo would be set to run correctly if we put them together in the same DC.
We aimed for 3000 nodes, but in the end we successfully formed a working, geographically distributed 4,700-node Docker Swarm cluster.
What we also learned from this issue was that the performance of the overlay network greatly depends on the correct tuning of network configuration on each host.
When the MySQL / WordPress test failed, we changed the plan to try NGINX on Routing Mesh.
The Ingress network is a /16 network which supports up to 64K IP addresses. Suggested by Alex Ellis, we then started 4,000 NGINX containers on the formed cluster. During this test, nodes were still coming in and out. The NGINX service started and the Routing Mesh was formed. It could correctly serve even as some nodes kept failing.
We concluded that the Routing Mesh in 1.12 is rock solid and production ready.
We then stopped the NGINX service and started to test the scheduling of as many containers as possible.
This time we simply used “alpine top” as we did for Swarm2K. However, the scheduling rate was quite slow. We went to 47,000 containers in approximately 30 minutes. Therefore it was going to be ~10.6 hours to fill the cluster with 1M containers. Unfortunately, because that would take too long, we decided to shut down the manager as it made no point to go further.
Scheduling with a huge batch of containers stressed out the cluster. We scheduled the launch of a large number of containers using “docker scale alpine=70000”. This created a large scheduling queue that would not commit until all 70,000 containers were finished scheduling. This is why when we shut down the managers all scheduling tasks disappeared and the cluster became unstable, for the Raft log got corrupted.
One of the most interesting things was that we were able to collect enough CPU profile information to show us what was keeping the cluster busy.
Here we can see that only 0.42% of the CPU was spent on the scheduling algorithm. I think we can say with certainty:
The Docker Swarm scheduling algorithm in version 1.12 is quite fast.
I would like to thanks Sematext again for the best-of-class Docker monitoring system, DigitalOcean for providing all resources for huge Docker Swarm managers, and the Docker Engineering team for making this great software and supporting us during the run.
While this time around we didn’t manage to launch all 150,000 containers we wanted to have, we did manage to create a nearly 5,000-node Docker Swarm cluster distributed over several continents. Lessons we’ve learned from this experiment will help us launch another huge Docker Swarm cluster next year. Thank you all and I’m looking forward to the new run!
SwarmZilla/swarm3k by Docker Captain Chanwit Kaewkasi is a unique community project/event aimed at launching a Docker Swarm cluster with 3000+ community-sponsored nodes. The previous project – Swarm2k – successfully demonstrated a 2000+ node Swarm cluster with only 3 Swarm managers running a workload with 95,000 tasks/containers on worker nodes.
Swarm3k goal is to run more than 3,000 nodes with a very large subnet – with 4096 IP addresses and several workloads, such as a 20 node MySQL cluster and 2,980 WordPress tasks on top of it.
Test run: October 19, 2016, 3 PM UTC – smaller test run – creating nodes, deploying services, etc.
Sematext is the official Docker Swarm3k monitoring partner – see Live Swarm3k Dashboard at http://sematext.com/swarm3k. Compared to the setup for Swarm2k we will enrich the monitoring setup with host metrics, container metrics and logging of Docker events and task errors. All these bits of operational data are collected by Sematext Docker Agent.
Until recently, automating the deployment of Performance Monitoring agents in Docker Swarm clusters was challenging because monitoring agents had to be deployed to each cluster node and the previous Docker releases (<Docker engine v1.12 / Docker Swarm 1.2.4) had no global service scheduler (Github issue #601). Scheduling services with via docker-compose and scheduling constraints required manual updates when the number of nodes changed in the swarm cluster – definitely not convenient for dynamic scaling of clusters! In Docker Swarm Monitoring and Logging we shared some Linux shell acrobatics as workaround for this issue.
The good news: All this has changed with Docker Engine v1.12 and new Swarm Mode. The latest release of Docker v1.12 provides many new features for orchestration and the new Swarm mode made it much easier to deploy Swarm clusters.
RancherOS is one of the few “container only” operating systems and it evolved into an excellent orchestration tool for containers, competing e.g. with CoreOS. It supports several types of schedulers such as its own “Cattle” scheduler, as well as Kubernetes, Docker Swarm, and Mesos. A unique feature of RancherOS is its GUI for container orchestration based on templates for application stacks. Rancher Labs maintains a catalog for such templates and has integrated a community catalog, which includes Sematext Docker Agent for the collection of metrics, events and logs from all RancherOS cluster nodes and containers.
Monitoring all RancherOS nodes can be done several different ways, depending on which orchestration tool you use:
Deployment via rancher-compose for the whole RancherOS cluster using the Cattle scheduler
Deployment via the GUI (rancher server) using the Community Catalog entry (available for the Cattle scheduler)
Deployment as Kubernetes DaemonSet via kubectl for the Kubernetes scheduler
Deployment as Swarm global service using Swarm scheduler
This post provides the walk-throughs for all these deployment/orchestration options, with the common goal of collecting metrics, logs and events from each RancherOS node and all containers.
Sematext Logging and Monitoring on RancherOS
Setup via Sematext Catalog Entry
When you run the Rancher server user interface, simply search in the community catalog for “sematext”, “monitoring” or “logs” and select “Sematext Docker Agent”.
Sematext Docker Agent in RancherOS Community Catalog
Choose “View Details”, and in the “Configuration Options” enter the SPM and Logsene App tokens. You can obtain these from https://apps.sematext.com, where you can sign up and create your SPM and Logsene apps. If your Rancher cluster runs behind a firewall, you might need to specify the proxy URL in the HTTPS_PROXY or HTTP_PROXY environment variable.
If you’d like to collect all logs, just press “Launch” without specifying any filter for containers or images.
If you prefer rancher-compose over GUIs and use the Cattle scheduler then rancher-compose is the right tool to deploy Sematext Docker Agent. The following configuration will activate Sematext Docker Agent on every node in the RancherOS cluster. You’ll need to replace the SPM/Logsene App tokens, of course:
Similar to Kubernetes on RancherOS the Cattle scheduler is deactivated for Swarm as well. Thus, you can deploy Sematext Docker Agent as a global service on Swarm (>Docker engine 1.12). Connect you Docker client to RancherOS Swarm API endpoint and run the following global service definition with your SPM/Logsene App tokens. This will add Sematext Docker Agent to each Swarm node as soon it get launched.
docker service create --mode global --name sematext-agent-docker \
--mount type=bind,src=/var/run/docker.sock,dst=/var/run/docker.sock \
-e SPM_TOKEN=YOUR_SPM_TOKEN -e LOGSENE_TOKEN=YOUR_LOGSENE_TOKEN \
Setup via Mesos Marathon
The following configuration will activate Sematext Docker Agent on every node in the Mesos cluster. Please note that you have to specify the number of Mesos nodes (instances), SPM App Token, and Logsene App Token. Example call to the Marathon API:
Users can pick their preferred solution together with Sematext Docker Agent for the collection of metrics, events and logs. We have seen users moving from one orchestration platform to another when they hit some issues and having the flexibility to do so easily is extremely valuable in the rapidly changing container world.
As we hope you can see, RancherOS provides this flexibility, while Sematext Docker Agent support for all leading orchestration platforms ensures continuous visibility into Docker operational intelligence – metrics, events, and logs, regardless of which orchestration platform you use.
Docker is growing by leaps and bounds, and along with it its ecosystem. Being light, the predominant container deployment involves running just a single app or service inside each container. Most software products and services are made up of at least several such apps/services. We all want all our apps/services to be highly available and fault tolerant. Thus, Docker containers in an organization quickly start popping up like mushrooms after the rain. They multiply faster than rabbits. While in the beginning we play with them like cute little pets, as their number quickly grow we realize we are dealing with aherd of cattle, implying we’ve become cowboys. Managing a herd with your two hands, a horse, and a lasso willget you only so far. You won’t be able to ride after each and every calf that wonders in the wrong direction. To get back to containers from this zoological analogy – operating so many moving pieces at scale is impossible without orchestration – this is why we’ve seen the rise of Docker Swarm, Kubernetes, Mesos, CoreOS, RancherOS and so on.
Container orchestration helps you manage your containers, their placement, their resources, and their whole life cycle. While containers and applications in them are running, in addition to the whole life cycle management we need container monitoring and log management so we can troubleshoot performance or stability issues, debug or tune applications, and so on. Just like with orchestration, there are a number of open-source container monitoring and logging tools. It’s great to have choices, but having lots of them means you need to evaluate and compare them to pick the one that best matches your needs.
DevOps Tools Comparison
We’ve open-sourced our Sematext Docker Agent (SDA for short) which works with SPM for monitoring and Logsene for log management (think of it as ELK as a Service), and wanted to provide a high level comparison of SDA and several popular Docker monitoring and logging tools, like CAdvisor, Logspout, and others. In the following table we group tools by functionality and include monitoring agents, log collectors and shippers, storage backends, and tools that provide the UI and visualizations. For each functionality we list in the “Common Tools” column one or more popular open-source tools that provide that functionality. An empty “Common Tools” cell means there are no popular open-source tools that provide it, or at least we are not aware of it — if we messed something up, please leave a comment or tweet @sematext.
Collect Logs from Docker API (including auto-discovery of new containers)
Sematext Docker Agent
Logspout Routing setup for containers via HTTP API to syslog, redis, kafka, logstash Docker Logging Drivers (e.g. syslog, journald, fluentd, etc.)
Sematext Docker Agent (routing of logs to different indices based on container labels)
Automatic log tagging (with Docker Compose or Swarm or Kubernetes metadata)
For Kubernetes: fluentd-elasticsearch, assumes Elasticsearch deployed locally
Sematext Docker Agent
Collect Docker Metrics
Sematext Docker Agent
Collect Docker Events
Sematext Docker Agent
Logs format detection (most tools need a static setup per logfile/application)
Sematext Docker Agent (out of the box format detection and parsing; the parser and the logagent-js pattern library is open source)
Logs parsing and shipping
Fluentd Logstash rsyslog syslog-ng
Sematext Docker Agent
Logs storage and indexing
Logsene (exposes Elasticsearch API)
Logs anomaly detection and alerting
Log search and analytics
Logsene (Logsene’s own UI or integrated Kibana, or Grafana connected to Logsene via Elasticsearch data source)
Some of the functionality provided by SPM and Logsene is not available in some of the most popular open-source monitoring and logging tools included here
Some of the SPM and Logsene functionality is indeed provided by some of the open-source tools, however none of them seems to encompass all the features, forcing one to mix and match and head down the tech debt-ridden Franken-monitoring path
Try it yourself in the MindMap below – pick a few functionalities and see how many different tools you might have to use?
Avoid building technical-debt & Franken-monitoring by using a limited number of Docker monitoring & logging tools Tweet
P.S.: Sematext Docker Agent is available in the Rancher Community Catalog and shows up with our new mascot “Octi” only one more pet 🙂 – so if you use RancherOS search for “sematext” in the RancherOS Catalog and within a few clicks you’ll have the Sematext Docker Agent deployed to your RancherOS clusters!
Monitoring of Docker environments is challenging. Why? Because each container typically runs a single process, has its own environment, utilizes virtual networks, or has various methods of managing storage. Traditional monitoring solutions take metrics from each server and applications they run. These servers and applications running on them are typically very static, with very long uptimes. Docker deployments are different: a set of containers may run many applications, all sharing the resources of one or more underlying hosts. It’s not uncommon for Docker servers to run thousands of short-term containers (e.g., for batch jobs) while a set of permanent services runs in parallel. Traditional monitoring tools not used to such dynamic environments are not suited for such deployments. On the other hand, some modern monitoring solutions (e.g. SPM from Sematext) were built with such dynamic systems in mind and even have out of the box reporting for docker monitoring. Moreover, container resource sharing calls for stricter enforcement of resource usage limits, an additional issue you must watch carefully. To make appropriate adjustments for resource quotas you need good visibility into any limits containers have reached or errors they have caused. We recommend using alerts according to defined limits; this way you can adjust limits or resource usage even before errors start happening.
May 5 2016 — Sematext, a global, Brooklyn-based products and services company that builds innovative Cloud and On Premises solutions for application performance monitoring, log management and analytics, today announced that it has been recognized by Docker as theEcosystem Technology Partner (ETP) for monitoring andlogging. This designation indicates thatSPM Performance Monitoring andLogsene have demonstrated working integration with the Docker platform via the Docker API and are available to users and organizations that seek solutions to monitor their Dockerized distributed applications.
Sematext Docker Agent is extremely easy to deploy on Docker Swarm,Docker Cloud and Docker Datacenter. It discovers new and existing containers, collects Docker performance metrics, events and logs, and runs in a tiny container on every Docker Host. In addition to standard log collection functionality the agent performs automatic log format detection and field extraction for a number of log formats, including Docker Swarm, Elasticsearch, Solr, Nginx, Apache, MongoDB, Kubernetes, etc.
Many organizations invest a lot of time in monitoring and logging setups because monitoring and logging changed dramatically with the introduction of Docker and related orchestration tools. We’ve observed that organizations and teams that use different tools for logging and monitoring often have difficultiescorrelating logs, events and metrics. Sematext automates performance monitoring and logging for Docker. Operational insights are provided in a single UI, which helps one efficiently correlate metrics, logs and events. Sematext Docker Agent detects many log formats and structures the logs automatically for analysis in Logsene.
“We would like to congratulate Sematext on their inclusion into Docker’s Ecosystem Technology Partner program for logging and monitoring,” said Nick Stinemates, VP of Business Development and Technical Alliances. “The ETP program recognizes organizations like Sematext that have demonstrated integration with the Docker platform to provide users with intelligent insights and increased visibility into their Dockerized environments. The goal is to provide users with the data needed to ensure the highest degree of availability and performance for all their business-critical applications”.
Sematext Group, Inc. is a global, Brooklyn-based products and services company that builds innovative Cloud and On Premises solutions forapplication performance monitoring,log management and analytics, and site search analytics. Sematext Docker Agent is extremely easy to deploy; it collects Docker performance metrics, events and logs and runs in a container on every Docker Host. In addition to standard log collection functionality the agent performs automatic log format detection and field extraction for a number of log formats. Besides monitoring Docker, Sematext SPM agents also monitor applications running inside and outside containers, such as Elasticsearch, Nginx, Apache, Kafka, Cassandra, Spark, Node.js, MongoDB, Solr, MySQL, etc.
Sematext also provides professional services around Elasticsearch, the ELK / Elastic Stack, and Apache Solr – Consulting, Training, and Production Support.
Docker Cloud is a hosted service for Docker Container Management, originally based on Tutum Cloud, which was acquired by Docker in October 2015. Sematext supported the deployment of Sematext Docker Agent on Tutum Cloud from the get-go, so naturally we were quick to add support for Docker Cloud as well.
What is Docker Cloud?
Docker Cloud is a container management service that supports multiple cloud providers such as Amazon, DigitalOcean, IBM Softlayer, MS Azure and Packet.net. This makes it much easier to switch Docker deployments to different cloud providers or use a mix of providers including on-premises nodes for hybrid cloud applications. The user interface in Docker Cloud makes it easy to manage nodes on all supported cloud platforms and is able to deploy application stacks in containers, defined in a “Stack YAML” file. This Stack files are very similar to Docker Compose files, but with additional options, e.g. to define deployment strategies for the containers. The graphical user interface helps to view and modify container configurations.