Sematext Core Container Monitoring
Monitor Docker and containerd containers, container orchestration systems like kubernetes, swarm and nomad, or cloud container orchestration services EKS, ECS, AKS, GKE with Sematext by installing simple Docker Certified Agents and using our interface that shows everything in one simple screen.
Create a Sematext Monitoring App¶
Creating a Sematext Monitoring App is as easy as choosing the Infra integration and giving the App a name.
Install the Sematext Agent¶
Sematext can easily monitor your containers with the Sematext Agent. Installing the Agent is as simple as running one command on each host.
See Container data in Sematext Monitoring¶
Sematext Agent collects a plethora of metrics about hosts (CPU, memory, disk, network, processes), containers (Docker, containerd) and orchestrator platforms and ships that to Sematext Cloud.
You can see host and container metrics or have a high-level overview of all your containers in Infrastructure reports.
Check out the Sematext Agent installation for containers guide for more info.
Container Alerting¶
To save you time Sematext automatically creates a set of default alert rules such as alerts for low disk space. You can create additional alerts on any metric.
There are 3 types of alerts in Sematext:
- Heartbeat alerts, which notify you when a server is down
- Threshold-based alerts that notify you when a metric value crosses a predefined threshold
- Alerts based on statistical anomaly detection that notify you when metric values suddenly change and deviate from the baseline
Container Events¶
Events reflect changes in your infrastructure, from node restarts to container deployments, or changes in running containers. Events can track every Docker command. Sematext Agent collects Events from the Docker Engine and Kubernetes API. Whenever something goes wrong in your container stack, you can correlate Logs or Metrics with the time of Docker events!
Here's the list of Docker container events Sematext collects:
Container lifecycle events¶
- Create – when a container is created
- Start – when a container starts
- Restart – when a container gets restarted
- Stop – when a container stops
- Oom – when a container runs out of memory
- Pause – when a container gets paused
- Unpause – when a container continues to run after a pause
- Die – when the main process in a container dies
- Kill – when the container gets killed
- Destroy – when a container gets destroyed
Container runtime events¶
- Commit – when changes to the container filesystem are committed. Modifying deployed containers in production is not a common practice, therefore the commit could - indicate a “hack” and should be watched carefully.
- Copy – when files are copied from/to a container. Could indicate a potential data leak.
- Attach – when a process connects to container console – somebody is reading your container logs
- Detach – when a process disconnects from container console streams
- Exec – when a command is executed in container console, very helpful to investigate in potential hacker attacks
- Export – when a container gets exported
- Health_status – when health_status is checked
- Rename – when a container gets renamed
- Resize – when a container gets resized
- Top – when somebody list top processes in a container
- Update – when a container is updated e.g. with new labels
Container image events¶
- Delete – when an image gets deleted
- Import – when an image gets imported
- Load – when an image is loaded
- Pull – when an image is pulled from a registry
- Push – when an image is pushed to a registry
- Save – when an image is saved
- Tag – when an image is tagged with labels
- Untag – when an image tag is removed
Container plugin events¶
- Enable – when a plugin gets enabled
- Disable – when a plugin gets disabled
- Install – when a plugin gets installed
- Remove – when a plugin gets removed
Container volume events¶
- Create – when a volume is created
- Destroy – when a volume gets destroyed
- Mount – when a volume is mounted to a container
- Unmount – when a volume is removed from a container
Container network events¶
- Create – when a network is created
- Connect – when a container connects to a network
- Remove – when the network is removed
- Destroy – when a network is destroyed
- Disconnect – when a container disconnects from a network
Container daemon events¶
- Reload
Container services, nodes, secrets, and config events¶
- Create – on the creation of a resource
- Remove – on the removal of a resource
- Update – on the creation of a resource
Container Metrics Overview¶
The following information is collected and transmitted to Sematext.
Type | Description |
Operating System Metrics | Host machine metrics
|
Container Metrics/Stats |
|
Events | |
Agent Startup Event | server-info – created by spm-agent framework with node.js and OS version info on startup. Please note the agent is implemented in node.js. |
Docker-info – Docker Version, API Version, Kernel Version on startup | |
Docker Events | Container Lifecycle Events| create, exec_create, destroy, export, ... |
Container Runtime Events | die, exec_start, kill, pause, restart, start, stop, unpause, ... |
Docker Logs | |
Default Fields |
|
Log formats (detection and log parsers) |
|
JSON, Plain Text |
Supported Platforms
- Docker Engine >= 17.0.0
- Platforms using Docker:
- Docker Data Center
- Docker Enterprise
- Kubernetes
- AWS ECS, AWS EKS
- AKS
- GKE
- Red Hat OpenShift
- Nomad
- Docker Swarm
- Mesos
- Rancher
Container Metrics Fields¶
Name | Type | Unit | Numeric Type | Label | Description |
---|---|---|---|---|---|
container.memory.usage | gauge | bytes | long | memory | container memory usage in bytes |
container.memory.fail.count | counter | long | memory | the number of times that memory cgroup limit was exceeded | |
container.memory.limit | gauge | bytes | long | memory | the max allowed memory limit for the container cgroup |
container.memory.limit.soft | gauge | bytes | long | soft memory limit | soft memory limit represents the initial memory reservation for the container |
container.memory.rss | gauge | bytes | long | RSS memory | number of bytes of anonymous (file unmapped memory) and swap cache memory |
container.cache.usage | gauge | bytes | long | cache memory | number of bytes of page cache memory |
container.memory.pages.in | counter | long | memory pages in | memory pages in,description=the number of events each time the page is accounted to the cgroup | |
container.memory.pages.out | counter | long | memory pages out | memory pages out,description=the number of events each time a page is unaccounted from the cgroup | |
container.memory.pages.fault | counter | long | memory page faults | the number of page faults accounted to the cgroup | |
container.memory.pages.fault.major | counter | long | major memory page faults | the number of major page faults accounted to the cgroup | |
container.swap.size | counter | bytes | long | swap | the number of bytes of swap usage |
container.swap.limit | gauge | bytes | long | swap limit | the swap memory usage limit |
container.io.read | gauge | long | disk read | the number of bytes read from the disk | |
container.io.read.time | gauge | ns | long | disk read time | the total amount of time (in nanoseconds) between request dispatch and request completion |
container.io.read.wait.time | counter | ns | long | disk read wait time | total amount of time the IO operations for this cgroup spent waiting in the scheduler queues |
container.io.write | counter | bytes | long | disk write | the number of bytes written to the disk |
container.io.write.time | counter | ns | long | disk write time | the total amount of time (in nanoseconds) between request dispatch and request completion |
container.io.write.wait.time | counter | ns | long | disk write wait time | total amount of time the IO operations for this cgroup spent waiting in the scheduler queues |
container.io.weight | gauge | ns | long | disk io weight | specifies the relative proportion of block I/O access ranging from 100 to 1000 |
container.cpu.percent | gauge | % | double | CPU usage | container CPU usage |
container.cpu.throttled.time | counter | microseconds | long | CPU throttled time | the total amount of time that processes have been throttled in the container cgroup |
container.cpu.shares | gauge | ns | long | CPU shares | represents the weight of the cgroup that translates into the amount of CPU it is expected to get. Upon cgroup creation each group gets assigned a default of 1024 |
container.cpu.quota | gauge | microseconds | long | CPU quota | enforces a hard limit to the CPU time allocated to processes |
container.cpu.period | gauge | microseconds | long | CPU period | is the time window expressed in microseconds that represents the period for which processes are allowed to run under specific quota |
container.network.rx.bytes | counter | bytes | long | network received | received amount of bytes on the network interface |
container.network.rx.packets | counter | long | network packets received | received amount of packets on the network interface | |
container.network.rx.errors | counter | long | network rx errors | received amount of errors on the network interface | |
container.network.rx.dropped | counter | long | network packets rx dropped | amount of dropped inbound packets on the network interface | |
container.network.tx | counter | long | network transmitted | transmitted amount of bytes on the network interface | |
container.network.tx.bytes | counter | bytes | long | network received | transmitted amount of bytes on the network interface |
container.network.tx.packets | counter | long | network packets transmitted | transmitted amount of packets on the network interface | |
container.network.tx.errors | counter | long | network tx errors | transmitted amount of errors on the network interface | |
container.network.tx.dropped | counter | long | network packets tx dropped | amount of dropped outbound packets on the network interface |