Skip to main content

The Complete Guide to Kubernetes Monitoring

Learn why you should monitor Kubernetes and best practices on how to get service discovery for all resources in your production environment. Ensure peak performance of your clusters and applications.

Monitoring is not easy when working with monolith architecture and it’s even less so since microservices were introduced. Kubernetes solved many challenges that came with distributed environments such as speed, scalability, and resilience. However, it also added complexity to logging and monitoring.

In our Kubernetes logging tutorial, we’ve already discussed the peculiarities of log management in Kubernetes, how to do it, and what best practices and tools can make your job easier.

This blog post is going to be all about Kubernetes monitoring โ€“ why it’s important, various monitoring approaches, and best practices you can follow to check and ensure the health of your cluster.

Container Overview

What Is Kubernetes Monitoring

Kubernetes monitoring gives you insight into your cluster’s current health, including performance metrics, resource counts, and a top-level overview of what is happening inside your Kubernetes cluster. Crucially, you need to be alerted when errors occur, so you can quickly act on them and fix any issues that arise.

Why Monitor Kubernetes

Sitting between your bare metal or virtual infrastructure โ€“ whether it’s Google, Azure, AWS โ€“ and the services that empower your applications, Kubernetes adds more complexity to already complex infrastructure.

Similar to the move from monolith to a microservice architecture, adopting Kubernetes means that you have to monitor many smaller components. There are hosts, the Kubernetes platform itself, containers, and the containerized apps, all of which ought to be monitored.

Combined with microservices, you now have on your hands a distributed and dynamic environment that requires a dynamic and granular approach to performance monitoring. Therefore, you need a monitoring system that enables you to track each individual component, whether it’s the available resources in a Kubernetes deployment or the health of your deployed pods.

Monitoring Kubernetes also helps with cost control. By giving you a full picture of resource usage and availability, you can make sure that pods, individual containers, and namespaces use underlying resources efficiently.

Kubernetes Pod Overview

Kubernetes Pod Overview

How Is Monitoring Kubernetes Different

In a few ways, monitoring Kubernetes is not that different from monitoring Docker. Similarly, running Kubernetes optimally has some of the same challenges and follows similar best practices as Docker container management. But, there are a few key differences between monitoring Kubernetes and legacy infrastructure.

Kubernetes sits between your bare metal or virtual infrastructure and the services that run your apps. That’s why you need to monitor the health of the Kubernetes control plane. Kubernetes introduces all these new layers of infrastructure complexity. Service can be distributed across multiple instances. Containers are ephemeral and move across your infrastructure as needed.

Hence why monitoring the state of all resources is key to understanding if Kubernetes is doing its job.

It’s hard to see what happens inside of containers. Once a container dies the data inside it can never be recovered. You can’t see the logs after the fact making troubleshooting incredibly complicated. Monitoring tools need to be able to gather all the metrics and logs and store them in a safe, centralized location so you can access them at any point of time and troubleshoot issues.

What Do You Monitor: Key Kubernetes Metrics to Measure

Monitoring Kubernetes involves monitoring the cluster, all the nodes, as well as monitoring pods, deployments, and services.

Kubernetes Cluster & Node Metrics

You first need to monitor the health of your entire Kubernetes cluster. It will help to know how many resources your entire cluster uses, how many applications are running on each node, and if your nodes are working properly and at what capacity. Disk and memory utilization, network bandwidth, the number of nodes available, and the number of running pods per node are just a few of the metrics you should measure.

Kubernetes Deployment & Pod Metrics

Pod-level monitoring involves looking at the metrics from containers and applications. Deployment monitoring looks into the pod health, crash loops, and resource usage.

Kubernetes Metrics

Kubernetes metrics help you ensure all pods in a deployment are running and healthy. They provide information such as how many instances a pod currently has and how many were expected. Some such metrics are the number of failed pods, pods restarts, pod resource usage vs. requests and limits.

Container Metrics

Container metrics help you determine how close you are to the resource limits you’ve configured and also allow you to detect pods stuck in a CrashLoopBackoff. Container CPU usage, container memory utilization, and network usage are just a few of the metrics that help you ensure the health of your containers.

Application Metrics

Application metrics measure the performance and availability of applications running inside your Kubernetes pods and are usually exposed by the applications themselves. The available metrics depend on the business scope of each application, but among the most common application metrics you’ll find latency, responsiveness, uptime, and response times.

If you’re interested in learning more about key resources you should monitor, we wrote this in-depth tutorial about Kubernetes metrics everyone needs to focus on to ensure the peak performance of their cluster.

Best Practices on How to Monitor Kubernetes

Even though you’re dealing with a complex and multilayered system, you should have a consistent monitoring strategy across the entire cluster. That means that metric collection should happen the same way and with the same reliability, no matter whether your nodes are deployed in the same location, in the cloud, across clouds, or in a hybrid cloud.

That said, there is one standardized approach you take to collect and extract metrics that help you assess the state of your Kubernetes clusters and all their resources in production.

Collecting metrics using DaemonSets

This method consists of running metric-collection software usually, called an agent as a pod called a DaemonSet. DaemonSets ensure that every node in the cluster has a copy of the DaemonSet pod. Similarly, as nodes are terminated, the pod is removed as well.

Remember this, a DaemonSet makes sure one Pod with the agent is running on each Kubernetes node.

With a DaemonSet you can deploy an agent that monitors each node, and all the resources on that node, across the whole Kubernetes cluster. Many monitoring solutions use this method to deploy their agent to each node. Then, their agents collect and expose metrics.

Here’s an example. You can use the Helm chart for Sematext Agent and deploy both metrics and logs collection as well as shipping it all to Sematext Cloud.

helm install --name st-agent \
  --set infraToken=<INFRA_TOKEN> \
  --set containerToken=<CONTAINER_TOKEN> \
  --set logsToken=<LOGS_TOKEN> \
  --set region=<US/EU> \

As simple as that, and you have full insight into your Kubernetes cluster and its resources.

Find out about other essential Kubernetes commands from our Kubernetes Tutorial or download our Kubernetes commands cheat sheet.

Kubernetes Cheat Sheet

We’ve prepared a Kubernetes Cheat Sheet which puts all key Kubernetes commands (think kubectl) at your fingertips. Organized in logical groups from resource management (e.g. creating or listing pods, services, daemons), viewing and finding resources, to monitoring and logging. Download yours.

Detecting Pods in Crash Loops

A CrashloopBackOff means that you have a pod starting, crashing, starting again, and then crashing again.

Multiple reasons can lead to a crash loop, making it tricky to identify the root cause. The few common culprits can be that the application inside the container crashes. It can also happen that the pod has been incorrectly configured, or an issue occurs when a new pod is deployed to Kubernetes. A common case is also when your application requires more CPU and memory than is allocated to the pod.

Being alerted when a crash loop occurs can help you quickly narrow down the list of causes and take emergency measures to keep your application active.

Job Failures

Kubernetes Jobs ensure that pods execute for a certain amount of time and then delete them as soon as they serve their purpose. Jobs can sometimes fail, either due to nodes rebooting or pods going into crash loops, or even resource exhaustion. Either way, you’d want to know about job failures as soon as they occur.

Job failures don’t necessarily mean that your application is inaccessible. Ignoring job failures could lead to more significant issues for your deployments down the line. Hence why it’s important to monitor.

Using Service Auto Discovery

Getting an overview of all the services running in your Kubernetes cluster is crucial in understanding the health of your cluster.

This is where service auto discovery comes into place.

Monitoring tools that help you monitor your Kubernetes cluster with automatic discovery of services and logs will help you keep costs at a bare minimum while providing observability for Kubernetes. But, you don’t have to believe me.

Kubernetes Monitoring Tools

There are some great Kubernetes monitoring tools out there that you can choose from. You may be fine starting with a built-in option or an open-source framework, but keep in mind that to reach the level of visibility of a fully-featured service like Sematext Monitoring you may have to add many additional components and end up with metrics upon metrics to monitor and tools upon tools to manage.

Monitoring Kubernetes with Sematext

Sematext supports Kubernetes Monitoring to get a complete overview of the control plane. All Kubernetes resources and their health is displayed on out-of-the-box dashboards. You don’t have to put in any extra work to get insight into the health of your cluster.

You get container and Kubernetes metrics along with labels and tags, which are exposed in the UI to allow slicing and dicing and building of custom dashboards on top of the ones you get by default.

Kubernetes logs, including Kubernetes audit logs, are automatically parsed and structured. Kubernetes labels and container tags are collected to enrich logs, enabling easier and faster troubleshooting.

You can also correlate logs with metrics for easier root-cause analysis. If you see a metric spike, correlate the exact time it happened with error logs from the same timeframe. Drill down into the logs to find the culprit behind the issue.

If you’re interested to see how Sematext stacks against similar services, read our blog post about the best Kubernetes monitoring tools where we review all types of solutions from built-in to open-source or fully-featured paid options.


Monitoring a distributed system is not easy, but having a good understanding of Kubernetes challenges when it comes to monitoring is the first step.

In this article, we’ve discussed why it’s important to monitor Kubernetes and what are the key metrics you should keep track of to ensure optimum health and performance. Once you’ve set your eyes on the right metrics, you only need the best tool for the job. Whatever you end up using, whether open source or an established Kubernetes monitoring SaaS, it depends on your particular specifications, the level of expertise of your engineers, your budget, and even personal preferences on which one suits your use case best.

We’ve just scratched the surface, but if you’re just beginning to design your Kubernetes monitoring strategy, it’s a good place to start. We suggest you get a good monitoring tool from the very beginning. Sematext is free for 14 days. Give it a try to see if it’s the right solution for you!

Stay up to date

Get tips, how-tos, and news about Elastic / ELK Stack, Observability, Solr, and Sematext Cloud news and updates.

Sematext Newsletter