swarm3k-review

Docker Swarm Lessons from Swarm3K

This is a guest post by Prof. Chanwit Kaewkasi, Docker Captain who organized Swarm3K – the largest Docker Swarm cluster to date.

Swarm3K Review

Swarm3K was the second collaborative project trying to form a very large Docker cluster with the Swarm mode. It happened on 28th October 2016 with more than 50 individuals and companies joining this project.

Sematext was one of the very first companies that offered to help us by offering their Docker monitoring and logging solution. They became the official monitoring system for Swarm3K. Stefan, Otis and their team provided wonderful support for us from the very beginning.

Swarm3K public dashboard by Sematext
Swarm3K public dashboard by Sematext

To my knowledge, Sematext is one and the only Docker monitoring company which allow to deploy the monitoring agents as the global Docker service at the moment. This deployment model provides for a greatly simplified the monitoring process.

Swarm3K Setup and Workload

There were two planned workloads:

  1. MySQL with WordPress cluster
  2. C1M

The 25 nodes formed a MySQL cluster. We experiences some mixing of IP addresses from both mynet and ingress networks. This was the same issue found when forming a cluster of Apache Spark in the past (see https://github.com/docker/docker/issues/24637). We prevented this by binding the cluster only to a single overlay network.

A WordPress node was scheduled somewhere on our huge cluster, and we intentionally didn’t control where it should be. When we were trying to connect a WordPress node to the backend MySQL cluster, the connection kept timing out. We concluded that a WordPress / MySQL combo would be set to run correctly if we put them together in the same DC.

We aimed for 3000 nodes, but in the end we successfully formed a working, geographically distributed 4,700-node Docker Swarm cluster.

Swarm3K Observations

What we also learned from this issue was that the performance of the overlay network greatly depends on the correct tuning of network configuration on each host.

When the MySQL / WordPress test failed, we changed the plan to try NGINX on Routing Mesh.

The Ingress network is a /16 network which supports up to 64K IP addresses. Suggested by Alex Ellis, we then started 4,000 NGINX containers on the formed cluster. During this test, nodes were still coming in and out. The NGINX service started and the Routing Mesh was formed. It could correctly serve even as some nodes kept failing.

We concluded that the Routing Mesh in 1.12 is rock solid and production ready.

We then stopped the NGINX service and started to test the scheduling of as many containers as possible.

This time we simply used “alpine top” as we did for Swarm2K. However, the scheduling rate was quite slow. We went to 47,000 containers in approximately 30 minutes. Therefore it was going to be ~10.6 hours to fill the cluster with 1M containers. Unfortunately, because that would take too long, we decided to shut down the manager as it made no point to go further.

Swarm3k Task Status
Swarm3k Task Status

Scheduling with a huge batch of containers stressed out the cluster. We scheduled the launch of a large number of containers using “docker scale alpine=70000”.  This created a large scheduling queue that would not commit until all 70,000 containers were finished scheduling. This is why when we shut down the managers all scheduling tasks disappeared and the cluster became unstable, for the Raft log got corrupted.

One of the most interesting things was that we were able to collect enough CPU profile information to show us what was keeping the cluster busy.

dockerd-flamegraph-01

Here we can see that only 0.42% of the CPU was spent on the scheduling algorithm. I think we can say with certainty: 

The Docker Swarm scheduling algorithm in version 1.12 is quite fast.

This means that there is an opportunity to introduce a more sophisticated scheduling algorithm that could result in even better resource utilization.

dockerd-flamegraph-02

We found that a lot of CPU cycles were spent on node communication. Here we see the Libnetwork’s member list layer. It used ~12% of the overall CPU.

dockerd-flamegraph-03

Another major CPU consumer was the Raft communication, which also caused the GC here. This used ~30% of the overall CPU.

Docker Swarm Lessons Learned

Here’s the summarized list of what we learned together.

  1. For a large set of nodes like this, managers require a lot of CPUs. CPUs will spike whenever the Raft recovery process kicks in.
  2. If the Leading manager dies, you better stop “docker daemon” on that node and wait until the cluster becomes stable again with n-1 managers.
  3. Don’t use “dockerd -D” in production. Of course, I know you won’t do that.
  4. Keep snapshot reservation as small as possible. The default Docker Swarm configuration will do. Persisting Raft snapshots uses extra CPU.
  5. Thousands of nodes require a huge set of resources to manage, both in terms of CPU and Network bandwidth. In contrast, hundreds of thousand tasks require high Memory nodes.
  6. 500 – 1000 nodes are recommended for production. I’m guessing you won’t need larger than this in most cases, unless you’re planning on being the next Twitter.
  7. If managers seem to be stuck, wait for them. They’ll recover eventually.
  8. The parameter –advertise-addr is mandatory for Routing Mesh to work.
  9. Put your compute nodes as close to your data nodes as possible. The overlay network is great and will require tweaking Linux net configuration for all hosts to make it work best.
  10. Despite slow scheduling, Docker Swarm mode is robust. There were no task failures this time even with unpredictable network connecting this huge cluster together.

“Ten Docker Swarm Lessons Learned” by @chanwit

Credits
Finally, I would like to thank all Swarm3K heroes: @FlorianHeigl, @jmaitrehenry from PetalMD, @everett_toews from Rackspace,  Internet Thailand, @squeaky_pl, @neverlock, @tomwillfixit from Demonware, @sujaypillai from Jabil, @pilgrimstack from OVH, @ajeetsraina from Collabnix, @AorJoa and @PNgoenthai from Aiyara Cluster, @f_soppelsa, @GroupSprint3r, @toughIQ, @mrnonaki, @zinuzoid from HotelQuickly,  @_EthanHunt_,  @packethost from Packet.io, @ContainerizeT – ContainerizeThis The Conference, @_pascalandy from FirePress, @lucjuggery from TRAXxs, @alexellisuk, @svega from Huli, @BretFisher,  @voodootikigod from Emerging Technology Advisors, @AlexPostID,  @gianarb from ThumpFlow, @Rucknar,  @lherrerabenitez, @abhisak from Nipa Technology, and @enlamp from NexwayGroup.

I would like to thanks Sematext again for the best-of-class Docker monitoring system, DigitalOcean for providing all resources for huge Docker Swarm managers, and the Docker Engineering team for making this great software and supporting us during the run.

While this time around we didn’t manage to launch all 150,000 containers we wanted to have, we did manage to create a nearly 5,000-node Docker Swarm cluster distributed over several continents.  Lessons we’ve learned from this experiment will help us launch another huge Docker Swarm cluster next year.  Thank you all and I’m looking forward to the new run!

 

docker-swarm-sematext-agent

Docker “Swarm Mode”: Full Cluster Monitoring & Logging with 1 Command

Until recently, automating the deployment of Performance Monitoring agents in Docker Swarm clusters was challenging because monitoring agents had to be deployed to each cluster node and the previous Docker releases (<Docker engine v1.12 / Docker Swarm 1.2.4) had no global service scheduler (Github issue #601).  Scheduling services with via docker-compose and scheduling constraints required manual updates when the number of nodes changed in the swarm cluster – definitely not convenient for dynamic scaling of clusters! In Docker Swarm Monitoring and Logging we shared some Linux shell acrobatics as workaround for this issue.

The good news: All this has changed with Docker Engine v1.12 and new Swarm Mode. The latest release of Docker v1.12 provides many new features for orchestration and the new Swarm mode made it much easier to deploy Swarm clusters.  

 

With Docker v1.12 services can be scheduled globally – similar to Kubernetes DaemonSet, RancherOS global services or CoreOS global fleet services

 

Read More

Sematext Solr Training

Apache Solr Training in NYC June 13-14

If you’ve missed our Core Solr training in October 2015 in New York, here is another chance – we’re running the 2-day Core Solr class again next month – June 13 & 14, 2016.
This course covers Solr 5.x as well as Solr 6.x!  You can see the complete course outline under Solr & Elasticsearch Training Overview . The course is very comprehensive — it includes over 20 chapters and lots of hands-on exercises.  Be prepared to learn a lot!

Cost:
$1,200 early bird rate (valid through June 1) and $1,500 afterwards.

There’s also a 50% discount for the purchase of a 2nd seat!

Location:

462 7th Avenue, New York, NY 10018 – see map

If you have any questions please get in touch.  To sign up, just register here.

Monitoring rsyslog with Kibana and SPM

A while ago we published this post where we explained how you can get stats about rsyslog, such as the number of messages enqueued, the number of output errors and so on. The point was to send them to Elasticsearch (or Logsene, our logging SaaS, which exposes the Elasticsearch API) in order to analyze them.

This is part 2 of that story, where we share how we process these stats in production. We’ll cover:

  • an updated config, working with Elasticsearch 2.x
  • what Kibana dashboards we have in Logsene to get an overview of what rsyslog is doing
  • how we send some of these metrics to SPM as well, in order to set up alerts on their values: both threshold-based alerts and anomaly detection

Read More

2015 in Review

Another year is behind us, and it’s been another good year for us at Sematext.  Here are the highlights in the chronological order.  If you prefer looking non-chronological overview, look further below.

January

We started the year by doing a ton of publishing on the blog – about Solr-Redis, about SPM and Slack, about Solr vs. Elasticsearch – always a popular topic, Spark, Kafka, Cassandra, Solr, etc.  Logsene being ELK as a Service means we made sure users have the freedom and flexibility to create custom Elasticsearch Index Templates in Logsene.

February

We added Account Sharing to all our products, thus making it easier to share SPM, Logsene, and Site Search Analytics apps by teams.  We made a big contribution to Kafka 0.8.2 by reworking pretty much all Kafka metrics and making them much more useful and consumable by Kafka monitoring agents.  We also added support for HAProxy monitoring to SPM.

March

We announced Node.js / io.js monitoring.  This was a release of our first Node.js-based monitoring monitoring agent – spm-agent-nodejs, and our first open-source agent.  The development of this agent resulted in creation of spm-agent an extensible framework for Node.js-based monitoring agents.  HBase is one of those systems with tons of metrics and with metrics that change a lot from release to release, so we updated our HBase monitoring support for HBase 0.98.

April

The SPM REST API was announced in April, and a couple of weeks later the spm-metrics-js npm module for sending custom metrics from Node.js apps to SPM was released on Github.

May

A number of us from several different countries gathered in Krakow in May.  The excuse was to give a talk about Tuning Elasticsearch Indexing Pipeline for Logs at GeeCon and give away our eBook – Log Management & Analytics – A Quick Guide to Logging Basics while sponsoring GeeCon, but in reality it was really more about Żubrówka and Vișinată, it turned out.  Sematext grew a little in May with 3 engineers from 3 countries joining us in a span of a couple of weeks.  We were only dozen people before that, so this was decent growth for us.

Right after Krakow some of us went to Berlin to give another talk: Solr and Elasticsearch – Side by Side with Elasticsearch and Solr: Performance and Scalability.  While in Berlin we held our first public Elasticsearch training and, following that, quickly hopped over to Hamburg to give a talk at a local search meetup.

June

In June we gave a talk on the other side of the Atlantic – in NYC – Beyond POC: Processing Metrics, Logs and Traces … at Scale.  We were conference sponsors there as well and took part in the panel about microservices.  We published our second eBook – Elasticsearch Monitoring Essentials eBook.  The two most important June happenings were the announcement of Docker monitoring – SPM for Docker – our solution for monitoring Docker containers, as well as complete, seamless integration of Kibana 4 into Logsene.  We’ve added Servers View to SPM and Logsene got much needed Alerting and Anomaly Detection, as well as Saved Searches and Scheduled Reporting.

July

In July we announced public Solr and Elasticsearch trainings, both in New York City, scheduled for October.  We built and open-sourced Logsene Command Line Interfacelogsene-cli – and we added Tomcat monitoring integration to SPM.

August

At Sematext we use Akka, among other things, and in August we introduced Akka monitoring integration for SPM and open-sourced the Kamon backend for SPM.  We also worked on and announced Transaction Tracing that lets you easily find slow transactions and bottlenecks that caused their slowness, along with AppMaps, which are a wonderful way to visualize all your infrastructure along applications running on it and see, in real-time, which apps and servers are communicating, how much, how often there are errors in each app, and so on.

September

In September we held our first 2 webinars on Docker Monitoring and Docker Logging.  You can watch them both in Sematext’s YouTube channel.

October

We presented From zero to production hero: Log Analysis with Elasticsearch at O’Reilly’s Velocity conference in New York and then Large Scale Log Analytics with Solr at Lucene/Solr Revolution in Austin.  After Texas we came back to New York for our Solr and Elasticsearch trainings.

November

Logsene users got Live Tail in November, while SPM users welcomed the new Top Database Operations report.  Live Tail comes in very handy when you want to semi-passively watch out for errors (or other types of logs) without having to constantly search for them.  While most SPM users have SPM monitoring agents on their various backend components, Top Database Operations gives them the ability to gain more insight in performance between the front-end/web applications and backend servers like Solr, Elasticsearch, or other databases by putting the monitoring agents on applications that act as clients for those backend services.  We worked with O’Reilly and produced a 3-hour Working with Elasticsearch Training Video.

December

We finished off 2015 by adding MongoDB monitoring to SPM, joining Docker’s ETP Program for Logging, further integrating monitoring and logging, ensuring Logsene works with Grafana, writing about monitoring Solr on Docker, publishing the popular Top 10 Node.js Metrics to Watch, as well as a SPM vs. New Relic APM comparison.  

Pivoting the above and grouping it by our products and services:

Logsene:

  • Live Tail
  • Alerting
  • Anomaly Detection
  • logsene-cli + logsene.js + logagent-js
  • Saved Searches
  • Scheduled Email Reporting
  • Integrated Kibana
  • Compatibility with Grafana
  • Search AutoComplete
  • Powerful click-and-filter
  • Native charting of numerical fields
  • Account Sharing
  • REST API

 

SPM:

  • Transaction Tracing
  • SPM Tracing API
  • AppMap
  • NetMap
  • On Demand Profiling
  • Integration with Logsene
  • Expanded monitoring for Elasticsearch, Solr, HBase, and Kafka
  • Added monitoring for Docker, Node.js, Akka, MongoDB, HAProxy, and Tomcat
  • Birds Eye Servers View
  • Account Sharing
  • REST API

 

Webinars:

  • Docker Monitoring
  • Docker Logging

 

Trainings:

  • Elasticsearch training in Berlin
  • Solr and Elasticsearch trainings in New York

 

eBooks:

  • Elasticsearch Monitoring Essentials
  • Log Management & Analytics – A Quick Guide to Logging Basics

 

Talks / Presentations / Conferences:

  • Lucene/Solr Revolution, Austin, TX – Large Scale Log Analytics with Solr
  • Velocity Conference, NYC, NY – Log Analysis with Elasticsearch
  • Berlin Buzzwords, Berlin, Germany – Side by Side with Elasticsearch and Solr: Performance and Scalability
  • GeeCon, Krakow, Poland – Tuning Elasticsearch Indexing Pipeline for Logs
  • DevOps Days, Warsaw, Poland – Running High Performance and Fault Tolerant Elasticsearch Clusters on Docker
  • DevOps Expo, NYC, NY – Process Metrics, Logs, and Traces at Scale

 

Trends:

All numbers are up  – our SPM and Logsene signups are up, product revenue is up a few hundred percent from last year, we’ve nearly doubled our blogging volume, our site traffic is up,we’ve made several UI-level facelifts for both apps.sematext.com and www.sematext.com, our team has grown, we’ve increased the number of our Solr and Elasticsearch Production Support customers, and we’ve added Solr and Elasticsearch Training to the list of our professional services.

 

Poll Results: Kafka Producer/Consumer

About 10 days ago we ran a a poll about which languages/APIs people use when writing their Apache Kafka Producers and Consumers.  See Kafka Poll: Producer & Consumer Client.  We collected 130 votes so far.  The results were actually somewhat surprising!  Let’s share the numbers first!

Kafka Producer/Consumer Languages
Kafka Producer/Consumer Languages

What do you think?  Is that the breakdown you expected?  Here is what surprised us:

  • Java is the dominant language on the planet today, but less than 50% people use it with Kafka! Read: possible explanation for Java & Kafka.
  • Python is clearly popular and gaining in popularity, but at 13% it looks like it’s extra popular in Kafka context.
  • Go at 10-11% seems quite popular for a relatively young language.  One might expect Ruby to have more adoption here than Go because Ruby has been around much longer.
  • We put C/C++ in the poll because these languages are still in use, though we didn’t expect it to get 6% of votes.  However, considering C/C++ are still quite heavily used generally speaking, that’s actually a pretty low percentage.
  • JavaScript and NodeJS are surprisingly low at just 4%.  Any idea why?  Is the JavaScript Kafka API not up to date or bad or ….?
  • The “Other” category is relatively big, at a bit over 12%.  Did we forget some major languages people often use with Kafka?  Scala?  See info about the Kafka Scala API here.

Everyone and their cousin is using Kafka nowadays, or at least that’s what it looks like from where we at Sematext sit.  However, because of the relatively high percentage of people using Python and Go, we’d venture to say Kafka adoption is much stronger among younger, smaller companies, where Python and Go are used more than “enterprise languages”, like Java, C#, and C/C++.

Kafka Poll: Producer & Consumer Client

Kafka has become the de-facto standard for handling real-time streams in high-volume, data-intensive applications, and there are certainly a lot of those out there.  We thought it would be valuable to conduct a quick poll to find out which which implementation of Kafka Producers and Consumers people use – specifically, which programming languages do you use to produce and consume Kafka messages?

Please tweet this poll and help us spread the word, so we can get a good, statistically significant results.  We’ll publish the results here and via @sematext (follow us!) in a week.

NOTE #: If you choose “Other”, please leave a comment with additional info, so we can share this when we publish the results, too!

NOTE #2: The results are in! See http://blog.sematext.com/2015/01/28/kafka-poll-results-producer-consumer/

Please tweet this poll and help us spread the word, so we can get a good, statistically significant results.  We’ll publish the results hereand via @sematext (follow us!) in a week.

Solr Redis Plugin Use Cases and Performance Tests

The Solr Redis Plugin is an extension for Solr that provides a query parser that uses data stored in Redis. It is open-sourced on Github by Sematext. This tool is basically a QParserPlugin that establishes a connection to Redis and takes data stored in SET, ZRANGE and other Redis data structures in order to build a query. Data fetched from Redis is used in RedisQParser and is responsible for building a query. Moreover, this plugin provides a highlighter extension which can be used to highlight parts of aliased Solr Redis queries (this will be described in a future).

Use Case: Social Network

Imagine you have a social network and you want to implement a search solution that can search things like: events, interests, photos, and all your friends’ events, interests, and photos. A naive, Solr-only-based implementation would search over all documents and filter by a “friends” field. This requires denormalization and indexing the full list of friends into each document that belongs to a user. Building a query like this is just searching over documents and adding something like a “friends:1234” clause to the query. It seems simple to implement, but the reality is that this is a terrible solution when you need to update a list of friends because it requires a modification of each document. So when the number of documents (e.g., photos, events, interests, friends and their items) connected with a user grows, the number of potential updates rises dramatically and each modification of connections between users becomes a nightmare. Imagine a person with 10 photos and 100 friends (all of which have their photos, events, interests, etc.).  When this person gets the 101th friend, the naive system with flattened data would have to update a lot of documents/rows.  As we all know, in a social network connections between people are constantly being created and removed, so such a naive Solr-only system could not really scale.

Social networks also have one very important attribute: the number of connections of a single user is typically not expressed in millions. That number is typically relatively small — tens, hundreds, sometimes thousands. This begs the question: why not carry information about user connections in each query sent to a search engine? That way, instead of sending queries with clause “friends:1234,” we can simply send queries with multiple user IDs connected by an “OR” operator. When a query has all the information needed to search entities that belong to a user’s friends, there is no need to store a list of friends in each user’s document. Storing user connections in each query leads to sending of rather large queries to a search engine; each of them containing multiple terms containing user ID (e.g., id:5 OR id:10 OR id:100 OR …) connected by a disjunction operator. When the number of terms grows the query requests become very big. And that’s a problem, because preparing it and sending it to a search engine over the network becomes awkward and slow.

How Does It Work?

The image below presents how Solr Redis Plugin works.

Read More

Solr Presentations from Lucene/Solr Revolution 2014

Thanks to everyone who stopped by the Sematext booth at last week’s Lucene/Solr Revolution event in Washington, DC and attended our two talks:

The attendance, questions and interest are very much appreciated.  As a company that prides itself on its Solr expertise (and Elasticsearch expertise too, for that matter), it was nice to spend a couple days talking about search and Big Data challenges, performance monitoring and logging with fellow experts from around the world. Here are the slides for the two talks we gave (summaries of the talks can be found here):

 

  Videos of the talks will be posted here soon.  Hope to see everyone again next year!