At the end of November, we’ll be migrating the Sematext Logs backend from Elasticsearch to OpenSearch

New Infrastructure & Kubernetes alerts

May 20, 2024

Monitoring your infrastructure just became a little easier. Now, when you create a new Infra App you will get 30+ default alert rules out of the box. These alerts are designed to keep you informed about critical issues that may impact your infrastructure. About half of them are aimed at your core infrastructure, while the other half is specifically for Kubernetes environments. We’ve done our best to configure these alert rules, but you may want to look over them and adjust them to fit your particular needs.

If you have an Infra App from before, don’t worry, we’ve added all these default alert rules to your Infra App(s), too. However, we disabled them by default in order to avoid suddenly creating a ton of alert noise and alarming you. You can inspect them yourself, adjust and enable them as you see fit. You will find them under your Alerts > Alert Rules (Sematext Cloud US / Sematext Cloud EU). Simply search for “Automatic” to find all default alert rules there.

New default alert rules

Core Infrastructure Alerts

  • High average CPU usage: Monitor the average CPU usage, to maintain system responsiveness and prevent the CPU being maxed out for too long without noticing it
  • High and Critical CPU iowait time: Identify high CPU I/O wait times for efficient I/O operations and to prevent latency
  • High and Critical system CPU steal time: High steal time can cause significant problems, including slow I/O, slower processing and slower database querying time
  • High system load: Get insights into high system load levels to maintain resource availability
  • Low and Critical System Available Memory: Detect low and critically low available system memory to prevent performance issues, crashes and out-of-memory failures
  • High and Critical system used swap: Monitor high system swap usage to prevent excessive swap usage and system slowdown
  • High and Critical disk space utilization: Receive notifications about high disk space utilization to prevent storage space issues and data loss
  • High and Critical system process space utilization: Monitor high process space utilization (PID limit) for efficient process management and system stability

Kubernetes Alerts

  • High CPU limit usage: Receive notifications when CPU limit usage is exceeded, preventing CPU overloads
  • CPU limit usage reached: Detect when CPU utilization is high and close to the CPU limits. Potentially the system will throttle the CPU usage when it is over the limit and we want to avoid that
  • High etcd leader change: Monitor leadership changes in your Kubernetes etcd cluster
  • Kubelet Volume Manager unavailable: Stay informed about issues with the Kubernetes kubelet volume manager
  • Kubelet Volume Manager “actual” value is zero: Receive alerts when the kubelet volume manager’s “actual” value is zero
  • API Server 4XX errors: Detect 4XX errors in your Kubernetes API server
  • API Server 5XX errors: Be aware of 5XX errors in your Kubernetes API server
  • Node under pressure: Receive alerts when Kubernetes nodes are under excessive load
  • Node CPU capacity anomalous: Be alerted to unusual CPU capacity issues on Kubernetes nodes
  • Node memory capacity anomalous: Stay informed about memory capacity issues on Kubernetes nodes
  • Missing replicas for deployments: Detect missing replicas for your Kubernetes deployments
  • Missing Pod replica in StatefulSet: Get notified when pod replicas are missing in Kubernetes StatefulSets
  • Pod status is failed: Receive alerts when pods in Kubernetes are in a failed state
  • Pod status is unknown: Be alerted to pods in an unknown state in Kubernetes
  • Pod status is pending: Monitor pending pods in Kubernetes
  • Pod restart amount reached: Detect when pods are restarted beyond a specified threshold
  • Pod status is “CrashLoopBackOff”: Be alerted to pods in a CrashLoopBackOff state in Kubernetes
  • Pod status is ImagePullBackOff: Receive notifications when pods are in an ImagePullBackOff state in Kubernetes

With these new preconfigured alerts, you can proactively get notified when incidents affect your Kubernetes clusters and your infrastructure as a whole.

Learn more about Alerts in Sematext and for more information, please refer to our Infrastructure Monitoring docs.