share
share
share
Kubernetes Metrics
Kubernetes Metrics
Sematext Agent seamlessly integrates with Kubernetes and cooperates with its leader election mechanism to elect one instance of the agent that is responsible for querying the API server and gathering the metrics. The list of supported metrics is summarized in the next section.
Cluster Metrics
Name |
Type |
Unit |
Description |
kubernetes.cluster.pod.count |
gauge |
|
number of pods in the cluster |
kubernetes.cluster.deployment.count |
gauge |
|
number of deployments in the cluster |
kubernetes.cluster.node.count |
gauge |
|
number of node comprising the cluster |
Pod Metrics
Name |
Type |
Unit |
Description |
kubernetes.pod.restarts |
counter |
long |
number of pod restarts |
kubernetes.pod.container.count |
gauge |
long |
number of containers inside pod |
kubernetes.pod.count |
gauge |
long |
pod count which is always equal to one |
kubernetes.pod.count.succeeded |
gauge |
long |
equal to one if all containers inside pod have terminated in success |
kubernetes.pod.count.failed |
gauge |
long |
equal to one if all containers inside pod have terminated and at least one container has terminated in failure |
kubernetes.pod.count.unknown |
gauge |
long |
equal to one if pod state can't be obtained |
kubernetes.pod.count.pending |
gauge |
long |
equal to one if the pod has been accepted by the scheduler and his containers are waiting to be created |
kubernetes.pod.count.running |
gauge |
long |
equal to one if the pod has been scheduled on a node and at least one of his containers is running |
Deployment Metrics
Name |
Type |
Unit |
Label |
Description |
kubernetes.deployment.count |
gauge |
|
deployment count |
number of active deployments |
kubernetes.deployment.replicas |
gauge |
|
replica count |
number of active replicas |
kubernetes.deployment.replicas.avail |
gauge |
|
number of available replicas. Replicas are marked as available if they are passing the health check |
|
kubernetes.deployment.replicas.desired |
gauge |
|
number of desired replicas as defined in the deployment |
|
DaemonSet Metrics
Name |
Type |
Unit |
Description |
kubernetes.daemonset.number.available |
gauge |
long |
number of nodes that should be running the daemon pod |
kubernetes.daemonset.number.misscheduled |
gauge |
long |
number of nodes that are not supposed to run the daemon pod |
kubernetes.daemonset.number.ready |
gauge |
long |
number of nodes that have one or more of the daemon pods running and ready |
kubernetes.daemonset.number.unavailable |
gauge |
long |
number of nodes that have none of the daemon pods running and available |
kubernetes.daemonset.scheduled.updated |
gauge |
long |
number of nodes that are running updated daemon pod |
kubernetes.daemonset.scheduled.current |
gauge |
long |
number of nodes that are running at least 1 daemon pod |
kubernetes.daemonset.scheduled.desired |
gauge |
long |
number of nodes that should be running the daemon pod |
StatefulSet Metrics
Name |
Type |
Unit |
Description |
kubernetes.statefulset.desired.replicas |
gauge |
long |
number of desired replicas |
kubernetes.statefulset.replicas |
gauge |
long |
number of active replicas |
kubernetes.statefulset.replicas.current |
gauge |
long |
number of current replicas |
kubernetes.statefulset.replicas.updated |
gauge |
long |
number of updated replicas |
kubernetes.statefulset.replicas.ready |
gauge |
long |
number of Pods that have a Ready Condition |
kubernetes.statefulset.revision.update |
gauge |
long |
update revision |
kubernetes.statefulset.revision.current |
gauge |
long |
the current version of the StatefulSet used to generate Pods |
CronJob Metrics
Metric Name |
Type |
Unit |
Description |
kubernetes.cronjob.suspended |
|
boolean |
has suspended state or not |
kubernetes.cronjob.successful |
|
boolean |
has successful state or not |
kubernetes.cronjob.failed |
|
boolean |
has failed state or not |
kubernetes.cronjob.active |
|
number |
'1' if is active and '0' if not |
kubernetes.cronjob.timestamp |
|
timestamp |
last time scheduled timestamp |
kubernetes.cronjob.creation.timestamp |
|
timestamp |
time creation timestamp |
kubernetes.cronjob.total |
gauge |
|
total number of cron jobs |
kubernetes.cronjob.suspended.total |
gauge |
|
total number of suspended cron jobs |
kubernetes.cronjob.successful.total |
gauge |
|
total number of successful cron jobs |
kubernetes,cronjob.failed.total |
gauge |
|
total number of failed cron jobs |
Job Metrics
Metric Name |
Type |
Unit |
Description |
kubernetes.job.condition |
|
number |
is the job finish condition, completed: 2, failed: 0 or suspended: 1 |
kubernetes.job.executions |
|
number |
job number of executions |
kubernetes.job.failures |
|
number |
job number of failures |
kubernetes.job.creation.timestamp |
|
number |
time creation timestamp |
kubernetes.job.total |
gauge |
|
total number of jobs |
kubernetes.job.completed.total |
gauge |
|
total number of completed jobs |
kubernetes.job.failed.total |
gauge |
|
total number of failed jobs |
Storage Metrics
Metric Name |
Type |
Unit |
Description |
kubernetes.pvc.used |
gauge |
bytes |
number of used inodes in the volume |
kubernetes.pvc.available |
gauge |
bytes |
number of available bytes in the volume |
kubernetes.pvc.capacity |
gauge |
bytes |
capacity in bytes of the volume |
Kubelet Metrics
Runtime Operations
Metric Name |
Type |
Unit |
Description |
kubelet.runtime_operation.count |
gauge |
|
cumulative number of runtime operations by operation type |
kubelet.runtime_operation.errors |
gauge |
|
cumulative number of runtime operation errors by operation type |
kubelet.runtime_operation.total_num |
gauge |
|
total number of runtime operations |
kubelet.runtime_operation.duration |
gauge |
seconds |
duration of runtime operations |
kubelet.runtime_operation.p50latency |
gauge |
seconds |
p50 latency in seconds of runtime operations |
kubelet.runtime_operation.p75latency |
gauge |
seconds |
p75 latency in seconds of runtime operations |
kubelet.runtime_operation.p90latency |
gauge |
seconds |
p90 latency in seconds of runtime operations |
kubelet.runtime_operation.p95latency |
gauge |
seconds |
p95 latency in seconds of runtime operations |
kubelet.runtime_operation.p99latency |
gauge |
seconds |
p99 latency in seconds of runtime operations |
Kubelet
Metric Name |
Type |
Unit |
Description |
kubelet.pods.instances |
gauge |
|
pods Instances |
kubelet.pods.running |
gauge |
|
running Pods |
kubelet.pods.started |
gauge |
|
started Pods |
kubelet.pods.started_error |
gauge |
|
pods Started Errors |
kubelet.containers.created |
gauge |
|
containers with status Created |
kubelet.containers.exited |
gauge |
|
containers with status Exited |
kubelet.containers.running |
gauge |
|
containers with status Running |
kubelet.containers.unknown |
gauge |
|
containers with status Unknown |
Pod Start Duration
Metric Name |
Type |
Unit |
Description |
kubelet.pod_start.total_num |
gauge |
|
number of pod starts |
kubelet.pod_start.duration |
gauge |
seconds |
duration of a pod start in seconds |
kubelet.pod_start.p50latency |
gauge |
seconds |
p50 latency in seconds for a single pod to go from pending to running |
kubelet.pod_start.p75latency |
gauge |
seconds |
p75 latency in seconds for a single pod to go from pending to running |
kubelet.pod_start.p90latency |
gauge |
seconds |
p90 latency in seconds for a single pod to go from pending to running |
kubelet.pod_start.p95latency |
gauge |
seconds |
p95 latency in seconds for a single pod to go from pending to running |
kubelet.pod_start.p99latency |
gauge |
seconds |
p99 latency in seconds for a single pod to go from pending to running |
Pod Worker Duration
Metric Name |
Type |
Unit |
Description |
kubelet.pod_worker.total_num |
gauge |
|
pod worker counter |
kubelet.pod_worker.duration |
gauge |
seconds |
pod worker duration |
kubelet.pod_worker.p50latency |
gauge |
seconds |
p50 latency in seconds to sync a single pod |
kubelet.pod_worker.p75latency |
gauge |
seconds |
p75 latency in seconds to sync a single pod |
kubelet.pod_worker.p90latency |
gauge |
seconds |
p90 latency in seconds to sync a single pod |
kubelet.pod_worker.p95latency |
gauge |
seconds |
p95 latency in seconds to sync a single pod |
kubelet.pod_worker.p99latency |
gauge |
seconds |
p99 latency in seconds to sync a single pod |
Pod Worker Start Duration
Metric Name |
Type |
Unit |
Description |
kubelet.pod_worker_start.total_num |
gauge |
|
worker start counter |
kubelet.pod_worker_start.duration |
gauge |
seconds |
duration for starting a worker |
kubelet.pod_worker_start.p50latency |
gauge |
seconds |
p50 latency in seconds from seeing a pod to starting a worker |
kubelet.pod_worker_start.p75latency |
gauge |
seconds |
p75 latency in seconds from seeing a pod to starting a worker |
kubelet.pod_worker_start.p90latency |
gauge |
seconds |
p90 latency in seconds from seeing a pod to starting a worker |
kubelet.pod_worker_start.p95latency |
gauge |
seconds |
p95 latency in seconds from seeing a pod to starting a worker |
kubelet.pod_worker_start.p99latency |
gauge |
seconds |
p99 latency in seconds from seeing a pod to starting a worker |
Volume Manager
Metric Name |
Type |
Unit |
Description |
kubelet.volume_manager.count |
gauge |
|
total volumes managed by the volume manager |
kubelet.volume_manager.desired.count |
gauge |
|
total volumes desired by the volume manager |
Storage Operation
Metric Name |
Type |
Unit |
Description |
kubelet.storage.total_num |
gauge |
|
total number of storage operations |
kubelet.storage.duration |
gauge |
seconds |
duration of storage operations in seconds |
kubelet.storage.p50latency |
gauge |
seconds |
p50 latency to perform storage operations |
kubelet.storage.p75latency |
gauge |
seconds |
p75 latency to perform storage operations |
kubelet.storage.p90latency |
gauge |
seconds |
p90 latency to perform storage operations |
kubelet.storage.p95latency |
gauge |
seconds |
p95 latency to perform storage operations |
kubelet.storage.p99latency |
gauge |
seconds |
p99 latency to perform storage operations |
Cgroup Manager
Metric Name |
Type |
Unit |
Description |
kubelet.cgroup.total_num |
gauge |
|
total number of cgroup management operations by the kubelet |
kubelet.cgroup.duration |
gauge |
seconds |
duration of cgroup management by the kubelet in seconds |
kubelet.cgroup.p50latency |
gauge |
seconds |
p50 latency for cgroup manager operations |
kubelet.cgroup.p75latency |
gauge |
seconds |
p75 latency for cgroup manager operations |
kubelet.cgroup.p90latency |
gauge |
seconds |
p90 latency for cgroup manager operations |
kubelet.cgroup.p95latency |
gauge |
seconds |
p95 latency for cgroup manager operations |
kubelet.cgroup.p99latency |
gauge |
seconds |
p99 latency for cgroup manager operations |
PLEG Relist Interval
Metric Name |
Type |
Unit |
Description |
kubelet.pleg_relist_interval.total_num |
gauge |
|
total number of intervals between pod relisting operations |
kubelet.pleg_relist_interval.duration |
gauge |
seconds |
duration of intervals between pod relisting operations |
kubelet.pleg_relist_interval.p50latency |
gauge |
seconds |
p50 latency of intervals between pod relisting operations |
kubelet.pleg_relist_interval.p75latency |
gauge |
seconds |
p75 latency of intervals between pod relisting operations |
kubelet.pleg_relist_interval.p90latency |
gauge |
seconds |
p90 latency of intervals between pod relisting operations |
kubelet.pleg_relist_interval.p95latency |
gauge |
seconds |
p95 latency of intervals between pod relisting operations |
kubelet.pleg_relist_interval.p99latency |
gauge |
seconds |
p99 latency of intervals between pod relisting operations |
Docker Operations
Metric Name |
Type |
Unit |
Description |
kubelet.docker_operation.errors |
gauge |
|
number of Docker operation errors |
kubelet.docker_operation.operations |
gauge |
|
total number of Docker operations |
kubelet.docker_operation.total_num |
gauge |
|
number of Docker operations |
kubelet.docker_operation.duration |
gauge |
seconds |
duration of docker operations in seconds |
kubelet.docker_operation.p50latency |
gauge |
seconds |
p50 latency in seconds of Docker operations |
kubelet.docker_operation.p75latency |
gauge |
seconds |
p75 latency in seconds of Docker operations |
kubelet.docker_operation.p90latency |
gauge |
seconds |
p90 latency in seconds of Docker operations |
kubelet.docker_operation.p95latency |
gauge |
seconds |
p95 latency in seconds of Docker operations |
kubelet.docker_operation.p99latency |
gauge |
seconds |
p99 latency in seconds of Docker operations |
HTTP Inflight Request
Metric Name |
Type |
Unit |
Description |
kubelet.http_inflight_request.total_num |
gauge |
|
number of inflight requests |
HTTP Request Duration
Metric Name |
Type |
Unit |
Description |
kubelet.http_request.operations |
gauge |
|
total number of HTTP request operations |
kubelet.http_request.total_num |
gauge |
|
total number of HTTP requests |
kubelet.http_request.duration |
gauge |
seconds |
HTTP request duration |
kubelet.http_request.p50latency |
gauge |
seconds |
p50 HTTP request latency in seconds |
kubelet.http_request.p75latency |
gauge |
seconds |
p75 HTTP request latency in seconds |
kubelet.http_request.p90latency |
gauge |
seconds |
p90 HTTP request latency in seconds |
kubelet.http_request.p95latency |
gauge |
seconds |
p95 HTTP request latency in seconds |
kubelet.http_request.p99latency |
gauge |
seconds |
p99 HTTP request latency in seconds |
Network Plugin Operations
Metric Name |
Type |
Unit |
Description |
kubelet.network_plugin_operation.errors |
gauge |
|
network plugin operation errors |
kubelet.network_plugin_operation.operations |
gauge |
|
|
kubelet.network_plugin_operation.total_num |
gauge |
|
total number of network plugin operations |
kubelet.network_plugin_operation.duration |
gauge |
seconds |
network plugin operations duration |
kubelet.network_plugin_operation.p50latency |
gauge |
seconds |
p50 latency in seconds of network plugin operations |
kubelet.network_plugin_operation.p75latency |
gauge |
seconds |
p75 latency in seconds of network plugin operations |
kubelet.network_plugin_operation.p90latency |
gauge |
seconds |
p90 latency in seconds of network plugin operations |
kubelet.network_plugin_operation.p95latency |
gauge |
seconds |
p95 latency in seconds of network plugin operations |
kubelet.network_plugin_operation.p99latency |
gauge |
seconds |
p99 latency in seconds of network plugin operations |
PLEG Relist Duration
Metric Name |
Type |
Unit |
Description |
kubelet.pleg_relist.total_num |
gauge |
|
number of pod relisting performed by kubelet |
kubelet.pleg_relist.duration |
gauge |
seconds |
duration of pod relisting performed by kubelet in seconds |
kubelet.pleg_relist.p50duration |
gauge |
seconds |
p50 latency of pod relisting |
kubelet.pleg_relist.p75duration |
gauge |
seconds |
p75 latency of pod relisting |
kubelet.pleg_relist.p90duration |
gauge |
seconds |
p90 latency of pod relisting |
kubelet.pleg_relist.p95duration |
gauge |
seconds |
p95 latency of pod relisting |
kubelet.pleg_relist.p99duration |
gauge |
seconds |
p99 latency of pod relisting |
Started Containers
Metric Name |
Type |
Unit |
Description |
kubelet.started_containers.errors |
gauge |
|
number of started containers with errors |
kubelet.started_containers.total_num |
gauge |
|
total number of started containers |
Volume Stats Inode
Metric Name |
Type |
Unit |
Description |
kubelet.volume_stats_inode.free |
gauge |
|
number of free inodes in the volume |
kubelet.volume_stats_inode.used |
gauge |
|
number of used inodes in the volume |
kubelet.volume_stats_inode.maximum |
gauge |
|
maximum number of inodes in the volume |
PLEG Events
Metric Name |
Type |
Unit |
Description |
kubelet.pleg_events.discard |
gauge |
|
number of discarded events by kubelet's PLEG |
kubelet.pleg_events.last_seen |
|
timestamp |
last timestamp in which PLEG was observed |
API Server Metrics
Inflight Requests
Metric Name |
Type |
Unit |
Description |
apiserver.inflight_requests.total_num |
gauge |
|
number of in-flight requests being processed by API server |
Audit Events
Metric Name |
Type |
Unit |
Description |
apiserver.audit_events.total_num |
gauge |
|
number of audit events generated and sent to the audit backend |
Request Duration
Metric Name |
Type |
Unit |
Description |
apiserver.etcd_request_duration.p50latency |
gauge |
seconds |
p50 latency of etcd requests |
apiserver.etcd_request_duration.p75latency |
gauge |
seconds |
p75 latency of etcd requests |
apiserver.etcd_request_duration.p90latency |
gauge |
seconds |
p90 latency of etcd requests |
apiserver.etcd_request_duration.p95latency |
gauge |
seconds |
p95 latency of etcd requests |
apiserver.etcd_request_duration.p99latency |
gauge |
seconds |
p99 latency of etcd requests |
apiserver.etcd_request_duration.duration |
gauge |
seconds |
duration of etcd requests measured in seconds |
apiserver.etcd_request_duration.total_num |
gauge |
|
number of etcd requests |
Storage Objects
Metric Name |
Type |
Unit |
Description |
apiserver.storage_objects.total_num |
gauge |
|
number of stored objects at the time of last check split by kind |
Client Requests
Metric Name |
Type |
Unit |
Description |
apiserver.client_requests.total_num |
gauge |
|
number of HTTP requests |
Rejected Audit Requests
Metric Name |
Type |
Unit |
Description |
apiserver.rejected_audit_req.total_num |
gauge |
|
number of API server audit requests rejected |
Current Executing Requests
Metric Name |
Type |
Unit |
Description |
apiserver.current_executing_reqs.total_num |
gauge |
|
number of requests in regular execution phase in the API Priority and Fairness system |
Current Inqueue Requests
Metric Name |
Type |
Unit |
Description |
apiserver.current_inqueue_reqs.total_num |
gauge |
|
number of requests currently pending in queues of the API Priority and Fairness system |
Admission Duration
Metric Name |
|
Unit |
Description |
apiserver.admission_duration.p50latency |
gauge |
seconds |
p50 latency of admission controller processing |
apiserver.admission_duration.p75latency |
gauge |
seconds |
p75 latency of admission controller processing |
apiserver.admission_duration.p90latency |
gauge |
seconds |
p90 latency of admission controller processing |
apiserver.admission_duration.p95latency |
gauge |
seconds |
p95 latency of admission controller processing |
apiserver.admission_duration.p99latency |
gauge |
seconds |
p99 latency of admission controller processing |
apiserver.admission_duration.duration |
gauge |
seconds |
duration of admission controller processing in the API server |
apiserver.admission_duration.total_num |
gauge |
|
number of admission controller processing in the API server |
TLS Error
Metric Name |
Type |
Unit |
Description |
apiserver.tls_handshake_error.total_num |
gauge |
|
number of requests dropped with 'TLS handshake error from' error |
Authentication Duration
Metric Name |
Type |
Unit |
Description |
apiserver.auth_duration.p50latency |
gauge |
seconds |
p50 latency of authentication process |
apiserver.auth_duration.p75latency |
gauge |
seconds |
p75 latency of authentication process |
apiserver.auth_duration.p90latency |
gauge |
seconds |
p90 latency of authentication process |
apiserver.auth_duration.p95latency |
gauge |
seconds |
p95 latency of authentication process |
apiserver.auth_duration.p99latency |
gauge |
seconds |
p99 latency of authentication process |
apiserver.auth_duration.duration |
gauge |
seconds |
duration of authentication processes in seconds |
apiserver.auth_duration.total_num |
gauge |
|
number of authentication processes |
Authenticated User Requests
Metric Name |
Type |
Unit |
Description |
apiserver.auth_user_req.total_num |
gauge |
|
number of authenticated requests |
Client Certificate Expiry
Metric Name |
Type |
Unit |
Description |
apiserver.client_cert_expiry.p50latency |
gauge |
seconds |
p50 latency of expiration time |
apiserver.client_cert_expiry.p75latency |
gauge |
seconds |
p75 latency of expiration time |
apiserver.client_cert_expiry.p90latency |
gauge |
seconds |
p90 latency of expiration time |
apiserver.client_cert_expiry.p95latency |
gauge |
seconds |
p95 latency of expiration time |
apiserver.client_cert_expiry.p99latency |
gauge |
seconds |
p99 latency of expiration time |
apiserver.client_cert_expiry.duration |
gauge |
seconds |
expiration time in seconds of the client certificates used to authenticate with the API server |
apiserver.client_cert_expiry.total_num |
gauge |
|
number of clients certificates used to authenticate with the API server |
Key Geneneration Fails
Metric Name |
Type |
Unit |
Description |
apiserver.key_gen_fails.total_num |
gauge |
|
number of failed data encryption key(DEK) generation operations |
Key Generation Duration
Metric Name |
Type |
Unit |
Description |
apiserver.key_gen_duration.p50latency |
gauge |
seconds |
p50 latency of key generation for storing data |
apiserver.key_gen_duration.p75latency |
gauge |
seconds |
p75 latency of key generation for storing data |
apiserver.key_gen_duration.p90latency |
gauge |
seconds |
p90 latency of key generation for storing data |
apiserver.key_gen_duration.p95latency |
gauge |
seconds |
p95 latency of key generation for storing data |
apiserver.key_gen_duration.p99latency |
gauge |
seconds |
p99 latency of key generation for storing data |
apiserver.key_gen_duration.duration |
gauge |
seconds |
duration of key generation for storing data in the API server's storage in seconds |
apiserver.key_gen_duration.total_num |
gauge |
|
number of key generation for storing data in the API server's storage |
Requests Total
Metric Name |
Type |
Unit |
Description |
apiserver.request_total.total_num |
gauge |
|
number of API server requests |
Requests Duration
Metric Name |
Type |
Unit |
Description |
apiserver.request_duration.p50latency |
gauge |
seconds |
p50 latency of API server requests |
apiserver.request_duration.p75latency |
gauge |
seconds |
p75 latency of API server requests |
apiserver.request_duration.p90latency |
gauge |
seconds |
p90 latency of API server requests |
apiserver.request_duration.p95latency |
gauge |
seconds |
p95 latency of API server requests |
apiserver.request_duration.p99latency |
gauge |
seconds |
p99 latency of API server requests |
apiserver.request_duration.duration |
gauge |
seconds |
duration of API server requests in seconds |
apiserver.request_duration.total_num |
gauge |
|
number of API server requests |
Etcd Metrics
Generic
Metric Name |
Type |
Unit |
Description |
etcd.has_leader |
gauge |
|
whether or not a leader exists |
etcd.leader.changes_seen |
gauge |
|
number of leader changes seen |
etcd.proposals.applied |
gauge |
|
number of consensus proposals applied |
etcd.proposals.committed |
gauge |
|
number of consensus proposals committed |
etcd.proposals.failed |
gauge |
|
number of failed proposals seen |
etcd.proposals.pending |
gauge |
|
number of pending proposals to commit |
etcd.version |
|
version |
version of etcd running |
Commit
Metric Name |
Type |
Unit |
Description |
etcd.disk_backend_commit.p50latency |
gauge |
seconds |
p50 latency of commit operations |
etcd.disk_backend_commit.p75latency |
gauge |
seconds |
p75 latency of commit operations |
etcd.disk_backend_commit.p90latency |
gauge |
seconds |
p90 latency of commit operations |
etcd.disk_backend_commit.p95latency |
gauge |
seconds |
p95 latency of commit operations |
etcd.disk_backend_commit.p99latency |
gauge |
seconds |
p99 latency of commit operations |
etcd.disk_backend_commit.duration |
gauge |
seconds |
duration of commit operations for etcd's disk backend in seconds |
etcd.disk_backend_commit.total_num |
gauge |
|
number of commit operations for etcd's disk backend commits |
WAL
Metric Name |
Type |
Unit |
Description |
etcd.disk_wal_fsync.p50latency |
gauge |
seconds |
p50 latency of fsync operations |
etcd.disk_wal_fsync.p75latency |
gauge |
seconds |
p75 latency of fsync operations |
etcd.disk_wal_fsync.p90latency |
gauge |
seconds |
p90 latency of fsync operations |
etcd.disk_wal_fsync.p95latency |
gauge |
seconds |
p95 latency of fsync operations |
etcd.disk_wal_fsync.p99latency |
gauge |
seconds |
p99 latency of fsync operations |
etcd.disk_wal_fsync.total_num |
gauge |
|
number of fsync (flush to disk) operations for etcd's write-ahead log (WAL) on the disk backend |
etcd.disk_wal_fsync.duration |
gauge |
seconds |
duration of fsync (flush to disk) operations for etcd's write-ahead log (WAL) on the disk backend in seconds |
Backend Snapshots
Metric Name |
Type |
Unit |
Description |
etcd.disk_backend_snapshot.p50latency |
gauge |
seconds |
p50 latency of snapshot creation |
etcd.disk_backend_snapshot.p75latency |
gauge |
seconds |
p75 latency of snapshot creation |
etcd.disk_backend_snapshot.p90latency |
gauge |
seconds |
p90 latency of snapshot creation |
etcd.disk_backend_snapshot.p95latency |
gauge |
seconds |
p95 latency of snapshot creation |
etcd.disk_backend_snapshot.p99latency |
gauge |
seconds |
p99 latency of snapshot creation |
etcd.disk_backend_snapshot.total_num |
gauge |
|
number of snapshot creation for etcd's disk backend |
etcd.disk_backend_snapshot.duration |
gauge |
seconds |
duration of snapshot creation for etcd's disk backend in seconds |
GRPC Total
Metric Name |
Type |
Unit |
Description |
etcd.grpc.handled |
gauge |
|
number of RPCs completed on the server, regardless of success or failure |
MVCC
Metric Name |
Type |
Unit |
Description |
etcd.mvcc.keys |
gauge |
|
number of keys |
etcd.mvcc.events |
gauge |
|
number of events sent by this member |
etcd.mvcc.keys_bytes |
gauge |
|
number of MVCC keys involved in etcd debugging operations |
etcd.mvcc.watch_stream |
gauge |
|
number of watch streams |
etcd.mvcc.watcher |
gauge |
|
number of watchers |
etcd.mvcc.db_read |
gauge |
|
number of currently open read transactions |
etcd.mvcc.db_size |
gauge |
bytes |
size of the underlying database physically allocated in bytes |
etcd.mvcc.db_used |
gauge |
bytes |
size of the underlying database logically in use in bytes |
etcd.mvcc.puts |
gauge |
|
number of puts seen by this member |
etcd.mvcc.deletes |
gauge |
|
number of deletes seen by this member |
etcd.mvcc.slow_watches |
gauge |
|
number of unsynced slow watchers |
etcd.watch.requests |
gauge |
|
number of incoming watch requests (new or reestablished) |
etcd.store.watchers |
gauge |
|
number of currently active watchers |
Commit Rebalance
Metric Name |
Type |
Unit |
Description |
etcd.commit_rebalance_duration.p50latency |
gauge |
seconds |
p50 latency of commit and rebalance operations |
etcd.commit_rebalance_duration.p75latency |
gauge |
seconds |
p75 latency of commit and rebalance operations |
etcd.commit_rebalance_duration.p90latency |
gauge |
seconds |
p90 latency of commit and rebalance operations |
etcd.commit_rebalance_duration.p95latency |
gauge |
seconds |
p95 latency of commit and rebalance operations |
etcd.commit_rebalance_duration.p99latency |
gauge |
seconds |
p99 latency of commit and rebalance operations |
etcd.commit_rebalance_duration.total_num |
gauge |
|
number of commit and rebalance operations for etcd disk backend |
etcd.commit_rebalance_duration.duration |
gauge |
seconds |
duration of commit and rebalance operations for etcd disk backend in seconds |
Commit Write
Metric Name |
Type |
Unit |
Description |
etcd.commit_write.p50latency |
gauge |
seconds |
p50 latency of write operations during commit |
etcd.commit_write.p75latency |
gauge |
seconds |
p75 latency of write operations during commit |
etcd.commit_write.p90latency |
gauge |
seconds |
p90 latency of write operations during commit |
etcd.commit_write.p95latency |
gauge |
seconds |
p95 latency of write operations during commit |
etcd.commit_write.p99latency |
gauge |
seconds |
p99 latency of write operations during commit |
etcd.commit_write.total_num |
gauge |
|
number of write operations during commit in etcd disk backend |
etcd.commit_write.duration |
gauge |
seconds |
duration of write operations during commit in etcd disk backend |
DB Compaction Duration
Metric Name |
Type |
Unit |
Description |
etcd.mvcc_db_compaction.p50latency |
gauge |
seconds |
p50 latency of MVCC database compaction operations |
etcd.mvcc_db_compaction.p75latency |
gauge |
seconds |
p75 latency of MVCC database compaction operations |
etcd.mvcc_db_compaction.p90latency |
gauge |
seconds |
p90 latency of MVCC database compaction operations |
etcd.mvcc_db_compaction.p95latency |
gauge |
seconds |
p95 latency of MVCC database compaction operations |
etcd.mvcc_db_compaction.p99latency |
gauge |
seconds |
p99 latency of MVCC database compaction operations |
etcd.mvcc_db_compaction.total_num |
gauge |
|
number of MVCC database compaction operations |
etcd.mvcc_db_compaction.duration |
gauge |
seconds |
duration of MVCC database compaction operations in miliseconds |
Kube-proxy Metrics
Generic
Metric Name |
Type |
Unit |
Description |
kubeproxy.endpoint.changes |
gauge |
|
number of proxy rules endpoint changes |
kubeproxy.endpoint.changes_pending |
gauge |
|
number of proxy rules endpoint changes pending |
kubeproxy.restore.failures |
gauge |
|
number of proxy iptables restore failures |
kubeproxy.service.changes |
gauge |
|
number of proxy rules service changes |
kubeproxy.service.changes_pending |
gauge |
|
number of proxy rules service changes pending |
Client Requests
Metric Name |
Type |
Unit |
Description |
kubeproxy.rest_client.total_num |
gauge |
|
number of HTTP requests, partitioned by status code, method, and host |
Client Duration
Metric Name |
Type |
Unit |
Description |
kubeproxy.rest_client_duration.p50latency |
gauge |
seconds |
p50 latency of REST client requests |
kubeproxy.rest_client_duration.p75latency |
gauge |
seconds |
p75 latency of REST client requests |
kubeproxy.rest_client_duration.p90latency |
gauge |
seconds |
p90 latency of REST client requests |
kubeproxy.rest_client_duration.p95latency |
gauge |
seconds |
p95 latency of REST client requests |
kubeproxy.rest_client_duration.p99latency |
gauge |
seconds |
p99 latency of REST client requests |
kubeproxy.rest_client_duration.duration |
gauge |
seconds |
duration of REST client requests in seconds |
kubeproxy.rest_client_duration.total_num |
gauge |
|
number of REST client requests |
Sync Proxy Rules Duration
Metric Name |
Type |
Unit |
Description |
kubeproxy.sync_proxy_rules.p50latency |
gauge |
seconds |
p50 latency of proxy rules synchronization |
kubeproxy.sync_proxy_rules.p75latency |
gauge |
seconds |
p75 latency of proxy rules synchronization |
kubeproxy.sync_proxy_rules.p90latency |
gauge |
seconds |
p90 latency of proxy rules synchronization |
kubeproxy.sync_proxy_rules.p95latency |
gauge |
seconds |
p95 latency of proxy rules synchronization |
kubeproxy.sync_proxy_rules.p99latency |
gauge |
seconds |
p99 latency of proxy rules synchronization |
kubeproxy.sync_proxy_rules.duration |
gauge |
seconds |
duration of proxy rules synchronization in seconds |
kubeproxy.sync_proxy_rules.total_num |
gauge |
|
number of proxy rules synchronization |
CoreDNS Metrics
Generic
Metric Name |
Type |
Unit |
Description |
coredns.cache.entries |
gauge |
|
number of elements in the cache |
coredns.cache.hits |
gauge |
|
number of cache hits |
coredns.cache.misses |
gauge |
|
number of cache misses |
coredns.panics.total_num |
gauge |
|
number of panics |
coredns.failed_reloads.total_num |
gauge |
|
number of failed reload attempts |
coredns.healthcheck_broken.total_num |
gauge |
|
number of complete failures of the healthchecks |
coredns.forward_max_concurrent_rejects.total_num |
gauge |
|
number of queries rejected because the concurrent queries were at maximum |
Request Duration
Metric Name |
Type |
Unit |
Description |
coredns.request_duration.p50latency |
gauge |
seconds |
p50 latency of DNS requests handled |
coredns.request_duration.p75latency |
gauge |
seconds |
p75 latency of DNS requests handled |
coredns.request_duration.p90latency |
gauge |
seconds |
p90 latency of DNS requests handled |
coredns.request_duration.p95latency |
gauge |
seconds |
p95 latency of DNS requests handled |
coredns.request_duration.p99latency |
gauge |
seconds |
p99 latency of DNS requests handled |
coredns.request_duration.duration |
gauge |
seconds |
duration of DNS requests handled by CoreDNS in seconds |
coredns.request_duration.total_num |
gauge |
|
number of DNS requests handled by CoreDNS |
Requests
Metric Name |
|
Unit |
Description |
coredns.request_counter.total_num |
gauge |
|
number of requests handled by CoreDNS |
Response Status Codes
Metric Name |
|
Unit |
Description |
coredns.response_code.total_num |
gauge |
|
number of response status codes |
Forward Cache Hits
Metric Name |
Type |
Unit |
Description |
coredns.forward_cache.hits |
gauge |
|
number of connection cache hits per upstream and protocol |
Forward Cache Misses
Metric Name |
Type |
Unit |
Description |
coredns.forward_cache.misses |
gauge |
|
number of connection cache misses per upstream and protocol |
Forward Request Duration
Metric Name |
Type |
Unit |
Description |
coredns.forward_request.p50latency |
gauge |
seconds |
p50 latency of forwarded DNS requests handled |
coredns.forward_request.p75latency |
gauge |
seconds |
p75 latency of forwarded DNS requests handled |
coredns.forward_request.p90latency |
gauge |
seconds |
p90 latency of forwarded DNS requests handled |
coredns.forward_request.p95latency |
gauge |
seconds |
p95 latency of forwarded DNS requests handled |
coredns.forward_request.p99latency |
gauge |
seconds |
p99 latency of forwarded DNS requests hanndled |
coredns.forward_request.duration |
gauge |
seconds |
duration of forwarding DNS requests handled by CoreDNS in seconds |
coredns.forward_request.total_num |
gauge |
|
number of forwarding DNS requests handled by CoreDNS |
Scheduler Metrics
Preemption Attempts
Metric Name |
Type |
Unit |
Description |
scheduler.preemption_attempts.total_num |
gauge |
|
number of preemption attempts in the cluster |
Pending Pods
Metric Name |
Type |
Unit |
Description |
scheduler.pending_pods.total_num |
gauge |
|
number of pending pods, by the queue type |
Queue Incoming Pods
Metric Name |
Type |
Unit |
Description |
scheduler.incoming_pods.total_num |
gauge |
|
number of pods added to scheduling queues by event and queue type |
Scheduler Attempts
Metric Name |
Type |
Unit |
Description |
scheduler.schedule_attempts.total_num |
gauge |
|
number of attempts to schedule pods, by the result |
Go Routines
Metric Name |
Type |
Unit |
Description |
scheduler.go_routines.total_num |
gauge |
|
number of running goroutines split by the work they do such as binding |
Cache
Metric Name |
Type |
Unit |
Description |
scheduler.cache.total_num |
gauge |
|
number of nodes, pods, and assumed (bound) pods in the scheduler cache |
E2E Scheduling Duration
Metric Name |
Type |
Unit |
Description |
scheduler.e2e_duration.p50latency |
gauge |
seconds |
p50 latency of end-to-end scheduling operations |
scheduler.e2e_duration.p75latency |
gauge |
seconds |
p75 latency of end-to-end scheduling operations |
scheduler.e2e_duration.p90latency |
gauge |
seconds |
p90 latency of end-to-end scheduling operations |
scheduler.e2e_duration.p95latency |
gauge |
seconds |
p95 latency of end-to-end scheduling operations |
scheduler.e2e_duration.p99latency |
gauge |
seconds |
p99 latency of end-to-end scheduling operations |
scheduler.e2e_duration.total_num |
gauge |
|
number of end-to-end scheduling operations |
scheduler.e2e_duration.duration |
gauge |
seconds |
duration of end-to-end scheduling operations measured in seconds (scheduling algorithm + binding) |
Scheduling Algorithm Duration
Metric Name |
Type |
Unit |
Description |
scheduler.scheduling_algorithm.p50latency |
gauge |
seconds |
p50 latency of scheduling algorithm executions |
scheduler.scheduling_algorithm.p75latency |
gauge |
seconds |
p75 latency of scheduling algorithm executions |
scheduler.scheduling_algorithm.p90latency |
gauge |
seconds |
p90 latency of scheduling algorithm executions |
scheduler.scheduling_algorithm.p95latency |
gauge |
seconds |
p95 latency of scheduling algorithm executions |
scheduler.scheduling_algorithm.p99latency |
gauge |
seconds |
p99 latency of scheduling algorithm executions |
scheduler.scheduling_algorithm.total_num |
gauge |
|
number of scheduling algorithm execution |
scheduler.scheduling_algorithm.duration |
gauge |
seconds |
duration of scheduling algorithm execution in seconds |
Preemption Victims
Metric Name |
Type |
Unit |
Description |
scheduler.preemption_victims.p50latency |
gauge |
seconds |
p50 latency of preemption victims |
scheduler.preemption_victims.p75latency |
gauge |
seconds |
p75 latency of preemption victims |
scheduler.preemption_victims.p90latency |
gauge |
seconds |
p90 latency of preemption victims |
scheduler.preemption_victims.p95latency |
gauge |
seconds |
p95 latency of preemption victims |
scheduler.preemption_victims.p99latency |
gauge |
seconds |
p99 latency of preemption victims |
scheduler.preemption_victims.total_num |
gauge |
|
number of preemption victims identified by the scheduler for resource reclamation |
scheduler.preemption_victims.duration |
gauge |
seconds |
duration of preemption victims identified by the scheduler for resource reclamation in seconds |
Scheduling Duration
Metric Name |
Type |
Unit |
Description |
scheduler.scheduling_duration.p50latency |
gauge |
seconds |
p50 latency of pod scheduling operations |
scheduler.scheduling_duration.p75latency |
gauge |
seconds |
p75 latency of pod scheduling operations |
scheduler.scheduling_duration.p90latency |
gauge |
seconds |
p90 latency of pod scheduling operations |
scheduler.scheduling_duration.p95latency |
gauge |
seconds |
p95 latency of pod scheduling operations |
scheduler.scheduling_duration.p99latency |
gauge |
seconds |
p99 latency of pod scheduling operations |
scheduler.scheduling_duration.total_num |
gauge |
|
number of pod scheduling operations in the scheduler |
scheduler.scheduling_duration.duration |
gauge |
seconds |
duration of pod scheduling operations in the scheduler measured in seconds |
Framework Extension Duration
Metric Name |
Type |
Unit |
Description |
scheduler.framework_extension.p50latency |
gauge |
seconds |
p50 latency of framework extension point executions |
scheduler.framework_extension.p75latency |
gauge |
seconds |
p75 latency of framework extension point executions |
scheduler.framework_extension.p90latency |
gauge |
seconds |
p90 latency of framework extension point executions |
scheduler.framework_extension.p95latency |
gauge |
seconds |
p95 latency of framework extension point executions |
scheduler.framework_extension.p99latency |
gauge |
seconds |
p99 latency of framework extension point executions |
scheduler.framework_extension.total_num |
gauge |
|
number of framework extension point execution measured |
scheduler.framework_extension.duration |
gauge |
seconds |
duration of framework extension point execution measured in seconds |
Scheduler Attempts
Metric Name |
Type |
Unit |
Description |
scheduler.attempts.total_num |
gauge |
|
number of attempts to schedule pods |
Kube-controller Metrics
Workqueue Wait
Metric Name |
Type |
Unit |
Description |
kubecontroller.workqueue_wait.p50latency |
gauge |
seconds |
p50 wait time latency before item is processed |
kubecontroller.workqueue_wait.p75latency |
gauge |
seconds |
p75 wait time latency before item is processed |
kubecontroller.workqueue_wait.p90latency |
gauge |
seconds |
p90 wait time latency before item is processed |
kubecontroller.workqueue_wait.p95latency |
gauge |
seconds |
p95 wait time latency before item is processed |
kubecontroller.workqueue_wait.p99latency |
gauge |
seconds |
p99 wait time latency before item is processed |
kubecontroller.workqueue_wait.duration |
gauge |
seconds |
duration of waiting time before item is processed |
kubecontroller.workqueue_wait.total_num |
gauge |
|
number of items being processed |
Workqueue Process
Metric Name |
Type |
Unit |
Description |
kubecontroller.workqueue_process.p50latency |
gauge |
seconds |
p50 latency of item being processed by Kube-controller |
kubecontroller.workqueue_process.p75latency |
gauge |
seconds |
p75 latency of item being processed by Kube-controller |
kubecontroller.workqueue_process.p90latency |
gauge |
seconds |
p90 latency of item being processed by Kube-controller |
kubecontroller.workqueue_process.p95latency |
gauge |
seconds |
p95 latency of item being processed by Kube-controller |
kubecontroller.workqueue_process.p99latency |
gauge |
seconds |
p99 latency of item being processed by Kube-controller |
kubecontroller.workqueue_process.duration |
gauge |
seconds |
duration of processing an item by Kube-controller |
kubecontroller.workqueue_process.total_num |
gauge |
|
number of items being processed by Kube-controller |
Workqueue Depth
Metric Name |
Type |
Unit |
Description |
kubecontroller.workqueue_depth.total_num |
gauge |
|
number of items queued for processing |
Workqueue Unfinished Work
Metric Name |
Type |
Unit |
Description |
kubecontroller.workqueue_unfinished_work.total_num |
gauge |
|
number of items in the workqueue |
Workqueue Longest Running
Metric Name |
Type |
Unit |
Description |
kubecontroller.workqueue_longest_running.total_num |
gauge |
|
number of longest-running task processors in workqueue |
Registered Nodes
Metric Name |
Type |
Unit |
Description |
kubecontroller.registered_nodes.total_num |
gauge |
|
number of registered nodes in the cluster |
Healthy Nodes
Metric Name |
Type |
Unit |
Description |
kubecontroller.healthy_nodes.percentage |
gauge |
|
number of healthy nodes in a specific zone |
Unhealthy Nodes
Metric Name |
Type |
Unit |
Description |
kubecontroller.unhealthy_nodes.percentage |
gauge |
|
number of unhealthy nodes in a specific zone |
Workqueue Retries
Metric Name |
Type |
Unit |
Description |
kubecontroller.workqueue_retries.total_num |
gauge |
|
number of retries attempted by workqueue items |
Runtime Metrics
Metric Name |
Type |
Unit |
Description |
kubernetes.runtime.os_threads |
gauge |
|
number of operating system threads used |
kubernetes.runtime.goroutines |
gauge |
|
number of goroutines active |
kubernetes.runtime.resident_memory |
gauge |
bytes |
amount of memory for process code and data occupied |
kubernetes.runtime.virtual_memory |
gauge |
bytes |
amount of virtual memory used |
kubernetes.runtime.number_frees |
gauge |
|
number of allocated memory blocks that have been freed |
kubernetes.runtime.number_mallocs |
gauge |
|
number of memory allocations performed |
kubernetes.runtime.heap_obtained |
gauge |
bytes |
memory obtained from the operating system for the heap |
kubernetes.runtime.heap_used |
gauge |
bytes |
amount of heap memory in use |
kubernetes.runtime.heap_waiting |
gauge |
bytes |
amount of heap memory currently unused and can be potentially allocated |
kubernetes.runtime.number_heap_objects |
gauge |
|
number of allocated objects in the heap memory |
kubernetes.runtime.stack_obtained |
gauge |
bytes |
memory obtained from the operating system for the stack space |
kubernetes.runtime.stack_used |
gauge |
bytes |
amount of stack memory in use |
kubernetes.runtime.gc_duration |
gauge |
seconds |
duration of a garbage collection event |
kubernetes.runtime.gc_count |
gauge |
|
number of garbage collection events |