Search site

Full Guide to Linux Disk IO Monitoring, Alerting and Tuning

Monitoring

Updated on: March 27, 2025

Disk IO (Input/Output) is a core aspect of system performance. Whether you’re managing a database, a web application, or a cloud server, how efficiently your system reads and writes data affects everything from response times to stability.

Unlike high CPU usage or memory bottlenecks that often manifest immediately, disk IO issues tend to creep up silently—until they slow down critical processes. A sluggish database query, an application taking too long to load, or a system hanging under load can often be traced back to disk performance.

This guide walks through setting up disk IO monitoring on Linux, covering both built-in tools and more advanced solutions. By the end, you’ll have a clear understanding of how to monitor, alert on, and, and optimize disk performance to keep your systems running smoothly.

Understanding Disk IO in Linux

Disk IO refers to the read and write operations between RAM and storage devices (HDD, SSD, or network storage). When applications request data, the system either retrieves it from memory (fast) or from disk (slower). Multiple processes competing for disk access can lead to contention and performance degradation.

Key Metrics to Monitor

Throughput – Measures data transfer speed (MB/s, GB/s).
IOPS – Tracks how many individual disk operations occur per second.
Latency – The time it takes for a read/write operation to complete (ms).
Disk Utilization – The percentage of time the disk is actively processing requests.

Fun fact: In AWS the IOPS you get is tied to the size of the disk….

Built-in Linux Tools for Disk IO Monitoring

If you are a console lover, this section is for you. We’re covering 5 powerful tools for monitoring disk IO. My favorite is dstat, but all these tools help track read/write speeds, disk utilization, and IOPS in real-time, making them essential for performance analysis and troubleshooting.

1. iostat – General Disk Performance Overview

iostat is one of the most effective for tracking disk IO performance.

Installation:
Most Linux distributions don’t include iostat by default. Why not!? Anyway, install it using:

sudo apt install sysstat # Debian/Ubuntu
sudo yum install sysstat # RHEL/CentOS
sudo dnf install sysstat # Fedora

Basic Usage:

iostat -x 1

-x provides extended statistics (including utilization and queue depth).
1 updates the stats every second.

Output:

Key Metrics in Output:

That -x output really is extended, but here are the key metrics in all that output that you want to pay extra attention to when troubleshooting disk IO.

r/s, w/s (Reads/Writes per second): How many read/write operations happen each second.
rMB/s, wMB/s (Read/Write throughput): Amount of data read/written per second in MB.
await (Average IO wait time in ms): High values indicate slow disk response times.
%util (Disk utilization): Percentage of time the disk is busy. If this is consistently above 80-90%, the disk may be a bottleneck.

If you are new to disk IO performance, the table below should point you in the right direction.

Symptom	Possible Cause
High await (above 20ms)	Slow storage device or IO bottleneck
Low r/s, w/s but high %util	Disk is struggling with large requests
High avgqu-sz	IO requests are piling up in the queue
High wrqm/s but low w/s	Writes are waiting too long before being committed

If %util is near 100% and await is high, the storage system is overloaded and may need tuning or hardware upgrades.

2. iotop – Process-Based IO Monitoring

iotop is a real-time disk monitoring tool that works similarly to top, but specifically for tracking disk read and write activity by the process. It will help you figure out which of your applications or services are generating the most IO load.

Installation:

sudo apt install iotop   # Debian/Ubuntu
sudo yum install iotop   # RHEL/CentOS

Basic Usage:

sudo iotop

Example Output:

It looks like top, surprise, surprise 🙂

Understanding iotop Output

DISK READ/DISK WRITE – This shows how much data each process is reading and writing per second.
SWAPIN % – Indicates if the process is using swap space (low values are good).
IO> – The percentage of time a process is waiting for IO (higher means the process is disk-bound).
COMMAND – Displays the exact process using disk resources.

A high IO> value (above 80%) means the process is IO-limited, which means this application is spending a lot of time waiting to read the data from the disk. Never a good thing. Moreover, not only is this application going to be slow. This may also slow down other applications that are utilizing the same disk.

Detecting Performance Issues Using iotop

Symptom	Possible Cause
Process with high IO> but low CPU usage	IO bottleneck slowing down the app
mysqld consuming most disk reads/writes	Database queries might need optimization
High disk writes from rsync or logrotate	Excessive logging or backups impacting performance
nginx showing unexpected high reads	Serving large static files from disk instead of caching

1. vmstat – System-Wide Performance Metrics

vmstat (Virtual Memory Statistics) is a versatile tool for monitoring overall system performance, including disk IO, CPU, memory, and processes. While it doesn’t provide per-process details like iotop, it offers a quick snapshot of your system’s health.

Basic Usage:

vmstat

Example Output:

procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
1  0  70400 137776  41652 369940    1    2   185   295   60  113  1  0 98  0  0

Relevant Columns:

r and b (Run/Block Processes):

r shows the number of processes waiting for CPU time.
b shows the number of processes blocked, often due to disk IO.

bi and bo (Block Input/Output):

bi measures data read from the disk (blocks per second).
bo measures data written to the disk (blocks per second).
Consistently high bo values may indicate excessive write activity.

wa (IO Wait):

Percentage of CPU time spent waiting for IO operations to complete.
High wa values (e.g., above 20%) suggest the system is IO-bound.

id (CPU Idle):
Percentage of CPU time spent idle. If id is low and wa is high, it’s a clear sign of IO bottlenecks. In plain English, this means that applications running on the host are waiting for the disk while not doing much.

Analyzing Performance with vmstat

Symptom	Possible Cause
High b values	Processes are blocked, likely due to disk IO contention.
High bo but low bi	Write-heavy workload, possibly from logs or backups.
High wa	Disk IO bottleneck; the storage device may be too slow or overloaded.
High bi and low bo	Read-heavy workload, common in database queries or file access.

2. dstat – Customizable Performance Monitoring

dstat is a powerful and flexible tool that combines features from iostat, vmstat, and netstat, which is why it’s my tool of choice when I’m working in the terminal. It provides real-time statistics for disk IO, network activity, CPU, memory, and more in an easy-to-read format.

Installation:

sudo apt install dstat   # Debian/Ubuntu
sudo yum install dstat   # RHEL/CentOS

Basic Usage:

dstat

Example Output:

--total-cpu-usage-- -dsk/total- -net/total- ---paging-- ---system--
usr sys idl wai stl| read  writ| recv  send|  in   out | int   csw
  1   0  99   0   0| 577k  923k|   0     0 |1408B 3786B| 101   188
  0   0 100   0   0|   0     0 |  70B  246B|   0     0 |  57    83
  0   0 100   0   0|   0     0 |  70B  134B|   0     0 |  41    58
  0   0 100   0   0|   0     0 |  70B  110B|   0     0 |  42    69
  0   0 100   0   0|   0     0 | 164B  208B|   0     0 |  51    76
  0   0 100   0   0|   0     0 |  70B  118B|   0     0 |  37    59
  0   0 100   0   0|   0     0 |  70B  110B|   0     0 |  39    69

Analyzing Disk IO Metrics with dstat

Metric	What It Tells You	Action to Take
read/writ	Real-time read/write throughput	High values may indicate heavy IO load.
util	Disk utilization percentage	Consistently above 80% may indicate a bottleneck.
tps	Transactions per second	Low TPS but high utilization may suggest inefficient IO patterns

3. sar – Historical Disk IO Monitoring

sar (System Activity Reporter) is ideal for capturing and analyzing historical performance data. So when you need to diagnose disk IO issues that occurred in the past or during specific time windows, this is the tool to use, not the other ones listed above. But note that, unlike the other tools above, which are only command-line tools, sar actually involves a service that runs all the time to collect data.

sar is part of the sysstat package and can record various system metrics, including disk IO, at regular intervals.

Installing and Enabling sar

To use sar, install the sysstat package:

sudo apt install sysstat   # Debian/Ubuntu 
sudo yum install sysstat   # RHEL/CentOS

Once installed, enable the sysstat service to start collecting data:

sudo systemctl enable --now sysstat

By default, sar collects system metrics every 10 minutes and stores them in /var/log/sysstat/. This interval can be adjusted in the configuration file located at /etc/sysstat/sysstat.

Basic Usage:

To view current disk IO metrics:

sar -d 1 5

-d specifies disk activity.
1 5 collects data every 1 second for 5 iterations.

Example Output:

vagrant@vagrant:~$ sar -d 1 5
Linux 5.4.0-89-generic (vagrant) 01/28/25 _aarch64_ (2 CPU)
22:20:55          DEV       tps     rkB/s     wkB/s     dkB/s   areq-sz    aqu-sz     await     %util
22:20:56       dev7-0      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
22:20:56       dev7-1      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
22:20:56       dev7-2      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
22:20:56       dev7-3      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
22:20:56       dev7-4      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
22:20:56       dev7-5      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
22:20:56       dev7-6      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
22:20:56       dev7-7      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
22:20:56     dev259-0      2.00      0.00     32.00      0.00     16.00      0.00      0.50      0.40
22:20:56     dev253-0      8.00      0.00     32.00      0.00      4.00      0.00      0.00      0.40
22:20:56       dev7-8      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
22:20:56       dev7-9      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00

Key columns include:

tps – Transactions per second.
rd_sec/s & wr_sec/s – Sectors read and written per second.
await – Average time (ms) for disk IO operations to complete.
%util – Disk utilization percentage.

Setting Up Alerts for Disk IO Issues

Monitoring disk IO metrics manually or periodically is helpful, but in production environments, automation is key. Setting up alerts ensures that you’re notified the moment disk performance issues occur, allowing for proactive troubleshooting before users or applications are affected.

This section covers how to automate disk IO monitoring and configure alerts using scripts, system tools, and external monitoring solutions.

1. Using Shell Scripts for Custom Alerts

You can write a shell script to monitor key disk IO metrics (e.g., from iostat) and trigger alerts when thresholds are exceeded.

Example: Monitoring Disk Utilization

Here’s a basic script that checks if disk utilization (%util) exceeds 80%:

#!/bin/bash

# Threshold for disk utilization
THRESHOLD=80

# Check disk utilization using iostat
UTIL=$(iostat -dx 1 2 | grep 'sda' | tail -1 | awk '{print $NF}')

# Compare utilization with threshold
if (( $(echo "$UTIL > $THRESHOLD" | bc -l) )); then
  echo "Disk utilization is high: ${UTIL}% on /dev/sda" | mail -s "Disk Alert" admin@example.com
fi

Steps to Deploy

1. Save the script as disk_alert.sh and make it executable:

chmod +x disk_alert.sh

2. Schedule the script to run periodically using cron:

crontab -e

Add a line to run the script every 5 minutes:

*/5 * * * * /path/to/disk_alert.sh

2. Cloud Solutions for Disk IO Monitoring and Alerting

While one could use the above approach to set up alerting, I wouldn’t really recommend doing that in production to anyone. That’d be a seriously poor man’s approach to monitoring, an approach that would quickly drive anyone crazy. There are several cloud-based solutions that I would recommend using instead. Monitoring and alerting are their core functionality. Just for example, in Sematext you will see charts like these out of the box:

Note the IO Read/Write chart at the bottom. That’s the visual version of the read/write metrics we saw the above Linux tools output to the terminal. Of course, alerting is built into Sematext and you can easily set it up – note that little bell icon in the screenshot above. You can use that to set up anomaly detection to get alerted about unusual spikes or dips in read or write performance.

Yes, I’ve purposely generated a very “messy” chart with too many data series to show you that even in such situations you can pick out insights about strange or high disk IO. You can see here that some set of hosts perform a ton of disk writes every night between around 2 AM and 4 AM. If your job is to run such infrastructure, you’ll want to know about this sort of stuff, and you’ll want to dig into what is happening there, ensure there is enough disk IO capacity, and so on.

Here is just another example of a more distilled down view of disk IO performance:

If your infrastructure is hosted in the cloud, you can also use platform-native monitoring services with built-in alerting features. AWS comes with CloudWatch, Google Cloud has Google Cloud Operations Suite, and Azure has Azure Monitor.

Tuning Disk IO for Better Performance

Monitoring disk IO is only half the battle—optimizing and tuning disk performance ensures that your system runs efficiently. Below are several strategies to improve disk IO performance, reduce bottlenecks, and maximize throughput.

1. Optimize File System and Mount Options

Using the right file system and mount options can significantly improve performance.

Use a modern file system:
- ext4 is optimized for general-purpose use.
- XFS is ideal for large-scale and high-performance workloads.
- btrfs provides advanced features like snapshotting and data integrity.
Enable write-back caching:

mount -o remount,noatime,commit=60 /dev/sda1 /mnt

noatime: Prevents unnecessary metadata writes when files are accessed.
commit=60: Reduces the frequency of metadata commits to disk (default is 5s).

Tune journal settings (for ext4 and XFS):

tune2fs -o journal_data_writeback /dev/sda1

2. Adjust IO Scheduler for Workload-Specific Optimization

Linux offers different IO schedulers that impact how disk requests are handled. Choosing the right one depends on your workload.

Check current scheduler:

cat /sys/block/sda/queue/scheduler

Change scheduler (temporary):

echo "none" > /sys/block/sda/queue/scheduler

Make it persistent (GRUB method):

Edit /etc/default/grub and modify the kernel parameters:

GRUB_CMDLINE_LINUX_DEFAULT="elevator=none"

Run:

sudo update-grub
sudo reboot

Scheduler choices:

none: Best for SSDs and NVMe drives.
mq-deadline: Good for databases and mixed workloads.
bfq: Ideal for desktop users to ensure responsive performance.

3. Increase Read/Write Buffers

Tuning kernel parameters can help improve disk throughput, especially in write-heavy workloads.

Adjust disk readahead (improves sequential reads):

blockdev --setra 4096 /dev/sda

Verify with:

blockdev --getra /dev/sda

Increase dirty writeback timers (delays syncing dirty pages to disk):

sysctl -w vm.dirty_ratio=40
sysctl -w vm.dirty_background_ratio=10

vm.dirty_ratio=40: Allows up to 40% of RAM to be used for dirty pages before forcing a flush to disk.
vm.dirty_background_ratio=10: When dirty pages exceed 10% of RAM, the background flush daemon starts writing data.

Enable TRIM for SSDs (Improves Performance and Longevity)

If using SSDs, enabling TRIM ensures efficient space reclamation.

Check if TRIM is supported:

lsblk --discard

Enable TRIM manually:

fstrim -av

Reduce Swap Usage

Excessive swap usage can degrade disk IO performance. If you have enough RAM, consider reducing swappiness.

Check current swappiness:

cat /proc/sys/vm/swappiness

Reduce swappiness (recommended for servers):

sysctl -w vm.swappiness=10

Monitor and Reduce Unnecessary IO

Identify processes generating excessive IO and optimize them.

Find IO-intensive processes:

iotop -o

Limit IO usage per process (ionice):

    ionice -c3 -p <PID>

Conclusion

Like maxed-out CPU, maxed-out disk reads or writes can really degrade your users’ experience with your product and negatively impact revenue. And who wants that!? If you are running a small operation, the Linux tools we covered here – iostat, iotop, vmstat, dstat, and sar – will help you keep an eye on your disk utilization.

If you are running a proper production system, use a proper monitoring solution, be it Sematext or Datadog, or something else.

Log in

Search site

Full Guide to Linux Disk IO Monitoring, Alerting and Tuning

Table of contents

Understanding Disk IO in Linux

Key Metrics to Monitor

Built-in Linux Tools for Disk IO Monitoring

1. iostat – General Disk Performance Overview

Key Metrics in Output:

2. iotop – Process-Based IO Monitoring

Understanding iotop Output

Detecting Performance Issues Using iotop

1. vmstat – System-Wide Performance Metrics

Analyzing Performance with vmstat

2. dstat – Customizable Performance Monitoring

Analyzing Disk IO Metrics with dstat

3. sar – Historical Disk IO Monitoring

Setting Up Alerts for Disk IO Issues

1. Using Shell Scripts for Custom Alerts

Example: Monitoring Disk Utilization

Steps to Deploy

2. Cloud Solutions for Disk IO Monitoring and Alerting

Tuning Disk IO for Better Performance

1. Optimize File System and Mount Options

2. Adjust IO Scheduler for Workload-Specific Optimization

3. Increase Read/Write Buffers

Adjust disk readahead (improves sequential reads):

Increase dirty writeback timers (delays syncing dirty pages to disk):

Enable TRIM for SSDs (Improves Performance and Longevity)

Conclusion

5 Ways to Prevent CPU Overload on Linux Servers

Kubernetes Alerting: 10 Must-Have Alerts for Proactive Monitoring

Elasticsearch to OpenSearch Migration Facilitated by Sematext

5 Ways to Prevent CPU Overload on Linux Servers

Kubernetes Alerting: 10 Must-Have Alerts for Proactive Monitoring

Elasticsearch to OpenSearch Migration Facilitated by Sematext

Search site

Full Guide to Linux Disk IO Monitoring, Alerting and Tuning

Table of contents

Understanding Disk IO in Linux

Key Metrics to Monitor

Built-in Linux Tools for Disk IO Monitoring

1. iostat – General Disk Performance Overview

Key Metrics in Output:

2. iotop – Process-Based IO Monitoring

Understanding iotop Output

Detecting Performance Issues Using iotop

1. vmstat – System-Wide Performance Metrics

Analyzing Performance with vmstat

2. dstat – Customizable Performance Monitoring

Analyzing Disk IO Metrics with dstat

3. sar – Historical Disk IO Monitoring

Setting Up Alerts for Disk IO Issues

1. Using Shell Scripts for Custom Alerts

Example: Monitoring Disk Utilization

Steps to Deploy

2. Cloud Solutions for Disk IO Monitoring and Alerting

Tuning Disk IO for Better Performance

1. Optimize File System and Mount Options

2. Adjust IO Scheduler for Workload-Specific Optimization

3. Increase Read/Write Buffers

Adjust disk readahead (improves sequential reads):

Increase dirty writeback timers (delays syncing dirty pages to disk):

Enable TRIM for SSDs (Improves Performance and Longevity)

Conclusion

5 Ways to Prevent CPU Overload on Linux Servers

Kubernetes Alerting: 10 Must-Have Alerts for Proactive Monitoring

Elasticsearch to OpenSearch Migration Facilitated by Sematext

Related posts:

Related posts:

5 Ways to Prevent CPU Overload on Linux Servers

Kubernetes Alerting: 10 Must-Have Alerts for Proactive Monitoring

Elasticsearch to OpenSearch Migration Facilitated by Sematext