The Complete Guide to Kubernetes Monitoring
Learn what Kubernetes metrics to monitor, how to do it & what are the best open-source and commercial tools to help ensure peak performance of your cluster.
Monitoring is not easy when working with monolith architecture and it’s even less so since microservices were introduced. Kubernetes solved many challenges that came with distributed environments such as speed, scalability and resilience. However, it also added complexity to logging and monitoring.
Today, we’re going to tackle Kubernetes monitoring – why it’s important, various monitoring approaches, and the tools available to help you ensure end-to-end visibility of your Kubernetes environment.
Why Monitor Kubernetes
Kubernetes monitoring gives you insight into your cluster’s current health, including performance metrics, resource counts and a top-level overview of what is happening inside your Kubernetes cluster. Crucially, you need to be alerted when errors occur, so you can quickly act on them and fix any issues that arise.
Sitting between your bare metal or virtual infrastructure – whether it’s Google, Azure, AWS – and the services that empower your applications, Kubernetes adds more complexity to already complex infrastructure. Similar to the move from monolith to a microservice architecture, adopting Kubernetes means that you have to monitor many smaller components. There are hosts, the Kubernetes platform itself, containers, and the containerized apps, all of which ought to be monitored.
Combined with microservices, you now have on your hands a distributed and dynamic environment that requires a dynamic and granular approach to performance monitoring. Therefore, you need a monitoring system that enables you to track each individual component, whether it’s the available resources in a Kubernetes deployment or the health of your deployed apps and containers.
Monitoring Kubernetes also helps with cost control. By giving you a full picture of resource usage and availability, you can make sure that pods, individual containers, and namespaces use underlying resources efficiently.
In fact, monitoring Kubernetes is not that different from monitoring Docker. Similarly, running Kubernetes optimally has some of the same challenges and follows similar best practices as Docker container management.
Kubernetes Pod Overview
What Metrics Should You Measure?
Monitoring Kubernetes involves monitoring the cluster, all the nodes, as well as monitoring pods, deployments and services.
Kubernetes Cluster Metrics
You first need to monitor the health of your entire Kubernetes cluster. It will help to know how many resources your entire cluster uses, how many applications are running on each node, and if your nodes are working properly and at what capacity.
Here are some of the most useful metrics for each:
- Node resource usage metrics like disk and memory utilization, CPU, network bandwidth, and many more, enable you to decide if you need to increase or decrease the number and size of each node in the cluster.
- The number of nodes available shows you what a cluster is used for and what you’re paying for if you’re using cloud providers.
- The number of running pods shows you if the number of nodes available is enough and if they could handle the workload in case a node fails.
Kubernetes Pod Metrics
Pod-level monitoring involves looking at three types of metrics: Kubernetes metrics, container metrics, and application metrics.
Kubernetes metrics help you ensure all pods in a deployment are running and healthy. They provide information such as how many instances a pod currently has and how many were expected. If the number is too low, your cluster may run out of resources. It’s also important to know how your deployment is progressing, as well as tracking network throughput and data.
Container metrics help you determine how close you are to the limits you’ve configured. You’re interested in monitoring metrics such as CPU, memory, and network usage. They also allow you to detect containers stuck in a CrashLoopBackoff.
Application metrics measure the performance and availability of applications running inside your Kubernetes cluster and are usually exposed by the applications themselves. The available metrics depend on the business scope of each application. Say you use a database application, then you’ll likely be interested in data related to tables, relationships, and indices’ state. If you work with an e-commerce application, you’ll want to keep an eye on the number of users online, for example.
How to Monitor Kubernetes
Even though you’re dealing with a complex and multilayered system, you should have a consistent monitoring strategy over the entire cluster. That means that metric collection should happen the same way and with the same reliability, no matter whether your nodes are deployed in the same location, in the cloud, across clouds, or in a hybrid cloud.
That said, there is one standardized approach you take to collect and extract metrics that help you assess the state of your Kubernetes clusters and all their resources in production.
Collecting metrics using DaemonSets
This method consists of running metric-collection software usually, called an agent as a pod called a DaemonSet. DaemonSets ensure that every node in the cluster has a copy of the DaemonSet pod. Similarly, as nodes are terminated, the pod is removed as well.
Remember this, a DaemonSet makes sure one Pod with the agent is running on each Kubernetes node.
With a DaemonSet you can deploy an agent that monitors each node, and all the resources on that node, across the whole Kubernetes cluster. Many monitoring solutions use this method to deploy their agent to each node. Then, their agents collect and expose metrics.
Here’s an example. You can use the Helm chart for Sematext Agent and deploy both metrics and logs collection as well as shipping it all to Sematext Cloud.
helm install --name st-agent \ --set infraToken=<INFRA_TOKEN> \ --set containerToken=<CONTAINER_TOKEN> \ --set logsToken=<LOGS_TOKEN> \ --set region=<US/EU> \ stable/sematext-agent
As simple as that, and you have full insight into your Kubernetes cluster and its resources.
Find out about other essential Kubernetes commands from our Kubernetes Tutorial or download our Kubernetes commands cheat sheet.
We’ve prepared a Kubernetes Cheat Sheet which puts all key Kubernetes commands (think kubectl) at your fingertips. Organized in logical groups from resource management (e.g. creating or listing pods, services, daemons), viewing and finding resources, to monitoring and logging. Download yours.
Best Kubernetes Monitoring Tools: Free, Open-source & Paid Comparison
Now that you know what metrics to keep an eye on when working with Kubernetes clusters, let’s see what tools can you use to collect and expose them.
We made a list of the most popular monitoring tools in the Kubernetes landscape, both open source, and commercial, to help you get started. Due to the complex nature of Kubernetes, we included solutions with various functionalities – some tackle logs, others just metrics; some are Kubernetes-native, other general-purpose; and some act as data collectors, while others as the interface.
Let’s start with the best fully-featured tools for monitoring Kubernetes in production.
Rolling your own monitoring solution is great if you have the time and resources for it. Many do not. Instead, you can use hosted SaaS monitoring tools. Let the people who have years of experience in managing monitoring tools do it for you, while you focus on building your own product and creating value.
Sematext offers a unified solution to monitor both traditional and microservice-based applications deployed on Kubernetes, capturing logs, metrics, and events in real time. You can then structure, visualize and analyze this data.
Our cloud monitoring solution allows you to set up alerts on both logs and metrics, built log analytics reports and customizable monitoring dashboards, thus making it easier and faster to point out problematic pods. Also making sure you get notified through email, Slack, or any other notification hook of your choosing.
Sematext takes care of logging and monitoring in Kubernetes, without running any storage or monitoring infrastructure yourself.
All you need to do is install the metrics and logs shipper provided by Sematext, or use any existing tool. With Sematext’s Elasticsearch API you can send data with any tool of your choice.
It’s easy to install the Sematext Agent as a Helm chart, DaemonSet, or Kubernetes Operator.
- Easy to install
- Custom monitoring dashboards
Elastic Stack (ELK)
The ELK stack is among the most popular open-source log management solutions, including for Kubernetes. But it can easily be used – and it is used by many – for monitoring purposes too.
It’s a collection of four tools that ensures an end-to-end logging pipeline. Elasticsearch is a full-text search and analytics engine where you can store Kubernetes logs. Logstash is a log aggregator that captures and processes logs before shipping them to Elasticsearch. Kibana provides reporting and visualization functionalities. And finally, Beats, are lightweight data shippers used to send logs and metrics to Elasticsearch.
ELK comes equipped with Kubernetes and Docker monitoring beats with auto-discovery. The Beats collect Kubernetes and Docker logs, metrics and metadata, thus helping you monitor performance at application and system level.
- Rich analytics capabilities
- Easy to deploy and run in Kubernetes environment
- Large community
- Challenges at scale
Datadog is an APM solution that enables you to extract logs, metrics, events and service states from Kubernetes in real time. It enables you to monitor, troubleshoot and optimize application performance. Datadog features dashboards and high-resolution metrics and events for manipulation and graphing. You can also set up alerts and receive notifications on various channels, including Slack and PagerDuty.
Datadog is easy to install. You can run it using their DaemonSet agent that will be deployed to every cluster node.
- Easy to install
- Great APM integration
- Confusing logs integrations
New Relic features Kubernetes integration, giving you an overview of your servers, hosts, applications, and services. You can capture data and metadata for nodes, pods, containers, deployments, replica sets, and namespaces. It also features powerful searching capabilities, as well as tag-driven alerting and dashboarding.
- Cluster health visualization
- Integration with APM
- Confusing onboarding
Besides these fully-featured products, we also looked at the best free and open-source monitoring tools. When you first jump into the Kubernetes ecosystem you’ll most likely always bump into CNCF projects. and you tend to always try them first. You can run such tools in your own cluster and manage it by yourself.
However, this always comes with overhead. We touched on this a bit further up where we listed tools from well established monitoring SaaS products.
What’s different about Prometheus compared to other time-series databases – such as Cassandra, Graphite, InfluxDB – is that it has a simple yet powerful multidimensional data model and its flexible query language (PromQL). Furthermore, it follows a pull model rather than push and has built-in real-time alerting mechanisms. Finally, being open-source, it gathered a large community interested in helping and bringing innovation.
Prometheus has no built-in dashboard, so you’ll need to use a visualization tool. Users opt for Grafana.
You can easily run Prometheus on top of Kubernetes by using the Prometheus Operator.
- Built-in monitoring and alerting
- Functional and reliable during outages
- Kubernetes-native, easy to use
- No anomaly detection
- Integrates well with Grafana
- Large community
- No long-term storage
- No dashboard
- No authentication/authorization
- Challenges at scale
Grafana is an open-source solution used for monitoring, metrics, data visualization, and analysis.
Compared to other visualization tools, Grafana stands out because it connects with a long list of databases. When used to monitor Kubernetes, Grafana usually sits on top of Prometheus, but it’s also popular in combination with InfluxDB or Graphite. You can build comprehensive monitoring dashboards with a wide variety of graphs, from heatmaps to line graphs, bar graphs, histograms or Geo maps. And, you can already find a lot of ready-to-use Kubernetes monitoring dashboards.
Grafana also features a built-in alerting system, along with filtering capabilities, annotations, data-source specific querying, authentication and authorization, cross-organizational collaboration, and many more.
Grafana with Prometheus
Grafana is easy to set up and use. It’s so popular in the Kubernetes community, that some deployment configuration files even include a Grafana container by default.
- Includes support for Elasticsearch and Prometheus
- Broad compatibility with various data sources
- Great reporting and visualization functions
- Active developer community
- Alerting capabilities
- Can query several entities at a time
- Not customized for Kubernetes log management
Kubernetes Dashboard is a web-based UI add-on for Kubernetes clusters. It provides a simple way to manage, troubleshoot and monitor your environment. You can use the Kubernetes Dashboard to see basic metrics related to memory and CPU usage statistics across all of your nodes and to monitor the health of workloads (pods, deployments, replica sets, cron jobs, etc.)
Kubernetes Dashboard Pod Overview
You can easily install the Kubernetes Dashboard with ready-to-use YAML files.
- Easy to install
- Part of the Kubernetes ecosystem
- Limited features
Jaeger is a free tracing tool used for monitoring and troubleshooting in complex distributed systems, including Kubernetes environments. It was released by Uber Technologies and open-sourced in 2016.
With Jaeger, users can perform root cause analysis, distributed transaction monitoring, distributed context propagations, service dependency analysis, and performance and latency optimization.
Jaeger features OpenTracing-based support for Java, Node, Phyton, Go, and C++ and for various data sources, including Cassandra, Elasticsearch, Kafka, and memory. Read more about how to use Jaeger as a distributed tracing system.
To deploy Jaeger, you can use the Jaeger Operator or a DaemonSet configuration.
- Various instrumentation options
- Easy to deploy
- Modern user interface
- Limited backend integration
Monitoring a distributed system is not easy, but having a good understanding of Kubernetes challenges when it comes to monitoring is the first step.
In this article, we’ve discussed why it’s important to monitor Kubernetes, the critical metrics you should keep track of, and the best monitoring tools available to help make your job easier.
Choosing a tool depends on your monitoring needs and use case. Open-source solutions have the advantage of being backed up by large communities driven to improve existing solutions, but SaaS software is backed up by experts offering support and, most importantly, it removes the overhead of managing the tools to monitor Kubernetes yourself. Learn more about what to look for in a monitoring software from our guide on alerting and monitoring.
Using a tool like Sematext abstracts away all the infrastructure and management needed to monitor your cluster. All you need to do is run a cluster-wide agent and point it where it should send metrics and logs. That’s how easy it is!
This will free up your time to stay focused on producing business value, instead of wasting time on managing infrastructure.
We’ve just scratched the surface, but if you’re just beginning to design your Kubernetes monitoring strategy, it’s a good place to start.