SwarmZilla/swarm3k by Docker Captain Chanwit Kaewkasi is a unique community project/event aimed at launching a Docker Swarm cluster with 3000+ community-sponsored nodes. The previous project – Swarm2k – successfully demonstrated a 2000+ node Swarm cluster with only 3 Swarm managers running a workload with 95,000 tasks/containers on worker nodes.
Swarm3k goal is to run more than 3,000 nodes with a very large subnet – with 4096 IP addresses and several workloads, such as a 20 node MySQL cluster and 2,980 WordPress tasks on top of it.
- Test run: October 19, 2016, 3 PM UTC – smaller test run – creating nodes, deploying services, etc.
- Real run: October 28, 2016, 3 PM UTC
Observing / Monitoring:
Sematext is the official Docker Swarm3k monitoring partner – see Live Swarm3k Dashboard at http://sematext.com/swarm3k. Compared to the setup for Swarm2k we will enrich the monitoring setup with host metrics, container metrics and logging of Docker events and task errors. All these bits of operational data are collected by Sematext Docker Agent.
Finally, we tweaked Sematext Docker Agent for the Swarm3k to deal with its special requirements. In addition, we decided to sponsor the resources for the monitoring including a dedicated extension of our infrastructure. As the result we get:
- Easy deployment of the monitoring agent as global swarm service.
- Public Grafana dashboards for the Docker community (see screenshots below).
- Display of Swarm nodes counting manager and worker nodes including number of running containers
- Docker Swarm task status monitoring including logging of task errors. Swarm tasks can have many states like preparing, starting, running, shutdown, failed, etc. Watching state changes is a good way to see the progress of service deployments on Swarm. When the deployment of a service fails, it is very useful to see the task error messages. To obtain this information without a Swarm dashboard one has to run several docker commands. Thus, the Swarm3k Live Dashboard includes all relevant information.
Swarm cluster overview including task monitoring (updated screeshot, Oct. 28, 2016)
- Tracking of Docker events, like pulling images, start or stop of containers, mounting volumes or creation of networks. We believe the Docker event log will be a great resource to analyze the results of the Swarm3k test.
Docker event and task error details
- Metrics from all nodes and containers. For the massive amount of containers and nodes it makes sense to compare the aggregated performance metrics like the total system CPU usage with the CPU usage of all containers.
Aggregated node performance metrics for the swarm cluster
Container Performance Metrics by service
If you are interested in this topic, stay tuned and follow Chanwit Kaewkasi and @sematext and check out the public Docker Swarm monitoring dashboard. If you need Docker monitoring for your own Docker deployments check out SPM for Docker, the Sematext Docker Agent, and Logsene.