Monitoring is not easy. Period.
In our guide to Kubernetes monitoring we explained how you need a different approach to monitoring Kubernetes than traditional VMs.
Also, just in case you need a little extra help with getting started with Kubernetes, we’ve made a short video introduction/guide for beginners. Check it out below.
In this blog post, we’ll go into more detail about the key Kubernetes metrics you have access to and how to make sense of them.
Kubernetes is the most popular container orchestrator currently available. It’s available as a service across all major cloud providers. Kubernetes is now a household name. It’s crucial you now learn how to monitor the health of your cluster and make sense of the metrics it exposes.
Kubernetes provides APIs for automation and cluster management, APIs for collecting performance data. These metrics fall into two categories:
- Metrics about the Kubernetes cluster itself, its state, and the nodes in the cluster
- Metrics from Kubernetes deployments and pods
Before jumping into metrics, let’s explain a thing or two about the different resources and objects that make up a Kubernetes cluster.
To understand which metrics to monitor, let’s first break down what type of objects you have in a Kubernetes cluster.
You have two node types. Worker nodes, which run your containerized workloads, and one or more Control Plane nodes.
The Control Plane provides centralized APIs and internal services for cluster management. It also maintains a record of cluster state in an etcd key-value store.
Worker nodes are host VMs. Each node has a kubelet process that monitors the worker node and gives it instructions on how to behave. The kubelet is the connection between the Control Plane and the worker node. It tells the worker node’s runtime environment to create and manage pods to run the workloads.
Kubernetes pods are isolated, easily replicable objects made up of one or more containers that share storage and a network IP address. Each pod is often just a single container, but pods can also be used to bundle containers that must run together.
Now you understand all the moving parts of a Kubernetes cluster. Let’s jump into some metrics.
Kubernetes Cluster & Node Metrics
You first need to monitor the health of your entire Kubernetes cluster. It will help to know how many resources your entire cluster uses, how many applications are running on each node, and if your nodes are working properly and at what capacity.
Here are some of the most useful metrics for each:
- Node resource usage metrics like disk and memory utilization, CPU, network bandwidth, and many more, enable you to decide if you need to increase or decrease the number and size of each node in the cluster. Keeping an eye on memory and disk usage at the node level can provide important insight into your cluster’s performance and ability to successfully run workloads. When pods exceed their limits, they will be terminated. If a node runs low on available memory and disk space, the kubelet flags it and begins to reclaim resources.
- The number of nodes available shows you what a cluster is used for and what you’re paying for if you’re using cloud providers.
- The number of running pods per node shows you if the size of the nodes available is enough and if they could handle the pod workload in case a node fails. This is crucial in case you’re using pod affinity, which allows you to constrain which nodes your pods are eligible to be scheduled on, based on labels on the node.
- Memory and CPU requests and limits define the minimum and maximum resources that a node’s kubelet can allocate to containers. Allocatable memory reflects the amount of memory on a node that is available for pods. Specifically, it takes the overall capacity and subtracts memory requirements for OS and Kubernetes system processes to ensure they will not fight user pods for resources. These metrics will inform you if your nodes have enough capacity to meet the memory requirements of all current pods and whether the Control Plane is able to schedule new ones.
Kubernetes Deployments & Pod Metrics
Pod-level monitoring involves looking at three types of metrics: Kubernetes metrics, container metrics, and application metrics.
Kubernetes metrics help you ensure all pods in a deployment are running and healthy. They provide information such as how many instances a pod currently has and how many were expected. If the number is too low, your cluster may run out of resources. It’s also important to know how your deployment is progressing, as well as tracking network throughput and data.
Here are some of the most important Kubernetes metrics you should keep track of:
- Current Deployment and Daemonset metrics keep track of two important types of controllers in your Kubernetes cluster. Several similar but distinct metrics are available, depending on what type of controller manages those objects. Deployments create a specified number of pods and DaemonSets, which ensure that a particular pod is running on every node.
- Missing and failed pods show if pods are running and how many pods are dying.
- Pod restarts show how many times pods restarted.
- Pods in the CrashLoopBackOff state signal a few different issues such as the application inside the container keeps crashing, or a faulty configuration that causes the pod to crash.
- Running vs. desired pods are crucial to see how many instances for each service are actually ready and how many do you expect to be ready.
- Pod resource usage vs. requests and limits are important to view if pod limits are set, and what the actual usage of CPU and memory is.
- Available and unavailable pods are crucial to track as a pod may be running but not available, meaning it is not ready and able to accept traffic. If you see spikes in the number of unavailable pods, or pods that are consistently unavailable, it might indicate a problem with their configuration.
Container metrics help you determine how close you are to the resource limits you’ve configured and also allow you to detect pods stuck in a CrashLoopBackoff. You’re interested in monitoring metrics such as:
- Container CPU usage helps you see how much CPU usage your containers are using versus the pod limits you set.
- Container memory utilization helps you see how much memory is utilized by your containers versus the pod limits you set.
- Network usage shows you the sent and received data packets and how much bandwidth you are using.
Application metrics measure the performance and availability of applications running inside your Kubernetes pods and are usually exposed by the applications themselves. The available metrics depend on the business scope of each application. Below are some of the most common application metrics you should monitor:
- Application availability measures the uptime and response times of your application. This is crucial to measure for optimal performance and user experience.
- Application health and performance show performance issues, responsiveness, latency, and all the usual horrors you do not want your users to go through. It also surfaces any errors you need to fix in the application layer.
Monitoring Kubernetes with Sematext
There are numerous Kubernetes monitoring tools you can choose from. You’ll be fine starting with running your own monitoring infra like Kubernetes Dashboards. However, keep in mind that to reach the level of visibility of a fully-featured service like Sematext Monitoring you may have to add many additional components and end up with metrics upon metrics to monitor and tools upon tools to manage.
Sematext supports Kubernetes Monitoring out-of-the-box. You get a complete overview of the control plane. All Kubernetes resources and their health is displayed on out-of-the-box dashboards.
You get container metrics and all key Kubernetes metrics along with their labels and tags. You use these for filtering and grouping in the UI, and building of custom dashboards on top of the ones you get by default.
Kubernetes logs, including Kubernetes audit logs, are automatically parsed and structured. Kubernetes labels and container tags are collected to enrich logs, enabling easier and faster troubleshooting, as are Kubernetes events.
You can also correlate logs with metrics for easier root-cause analysis. If you see a metric spike, correlate the exact time it happened with error logs from the same timeframe. Drill down into the logs to find the culprit behind the issue. Learn how to tail Kubernetes logs with Sematext and kubectl.
Monitoring is hard. Kubernetes makes it even harder. But, with a good understanding of Kubernetes, its objects and metrics, it becomes much easier to overcome. Once you’ve learned these basics, you’ll be much more confident in monitoring your own Kubernetes cluster.
We’ve just scratched the surface, but if you’re just beginning to design your Kubernetes monitoring strategy, it’s a good place to start. We suggest you get a good monitoring tool from the very beginning. Sematext monitoring is free for 14 days. Give it a try to see if it’s the right solution for you!