Disk IO (Input/Output) is a core aspect of system performance. Whether you’re managing a database, a web application, or a cloud server, how efficiently your system reads and writes data affects everything from response times to stability.
Unlike high CPU usage or memory bottlenecks that often manifest immediately, disk IO issues tend to creep up silently—until they slow down critical processes. A sluggish database query, an application taking too long to load, or a system hanging under load can often be traced back to disk performance.
This guide walks through setting up disk IO monitoring on Linux, covering both built-in tools and more advanced solutions. By the end, you’ll have a clear understanding of how to monitor, alert on, and, and optimize disk performance to keep your systems running smoothly.
Understanding Disk IO in Linux
Disk IO refers to the read and write operations between RAM and storage devices (HDD, SSD, or network storage). When applications request data, the system either retrieves it from memory (fast) or from disk (slower). Multiple processes competing for disk access can lead to contention and performance degradation.
Key Metrics to Monitor
- Throughput – Measures data transfer speed (MB/s, GB/s).
- IOPS – Tracks how many individual disk operations occur per second.
- Latency – The time it takes for a read/write operation to complete (ms).
- Disk Utilization – The percentage of time the disk is actively processing requests.
Fun fact: In AWS the IOPS you get is tied to the size of the disk….
Built-in Linux Tools for Disk IO Monitoring
If you are a console lover, this section is for you. We’re covering 5 powerful tools for monitoring disk IO. My favorite is dstat, but all these tools help track read/write speeds, disk utilization, and IOPS in real-time, making them essential for performance analysis and troubleshooting.
1. iostat – General Disk Performance Overview
iostat is one of the most effective for tracking disk IO performance.
Installation:
Most Linux distributions don’t include iostat by default. Why not!? Anyway, install it using:
sudo apt install sysstat # Debian/Ubuntu sudo yum install sysstat # RHEL/CentOS sudo dnf install sysstat # Fedora |
Basic Usage:
iostat -x 1
- -x provides extended statistics (including utilization and queue depth).
- 1 updates the stats every second.
Output:
Key Metrics in Output:
That -x output really is extended, but here are the key metrics in all that output that you want to pay extra attention to when troubleshooting disk IO.
- r/s, w/s (Reads/Writes per second): How many read/write operations happen each second.
- rMB/s, wMB/s (Read/Write throughput): Amount of data read/written per second in MB.
- await (Average IO wait time in ms): High values indicate slow disk response times.
- %util (Disk utilization): Percentage of time the disk is busy. If this is consistently above 80-90%, the disk may be a bottleneck.
If you are new to disk IO performance, the table below should point you in the right direction.
Symptom | Possible Cause |
High await (above 20ms) | Slow storage device or IO bottleneck |
Low r/s, w/s but high %util | Disk is struggling with large requests |
High avgqu-sz | IO requests are piling up in the queue |
High wrqm/s but low w/s | Writes are waiting too long before being committed |
If %util is near 100% and await is high, the storage system is overloaded and may need tuning or hardware upgrades.
2. iotop – Process-Based IO Monitoring
iotop is a real-time disk monitoring tool that works similarly to top, but specifically for tracking disk read and write activity by the process. It will help you figure out which of your applications or services are generating the most IO load.
Installation:
sudo apt install iotop # Debian/Ubuntu sudo yum install iotop # RHEL/CentOS
Basic Usage:
sudo iotop
Example Output:
It looks like top, surprise, surprise 🙂
Understanding iotop Output
- DISK READ/DISK WRITE – This shows how much data each process is reading and writing per second.
- SWAPIN % – Indicates if the process is using swap space (low values are good).
- IO> – The percentage of time a process is waiting for IO (higher means the process is disk-bound).
- COMMAND – Displays the exact process using disk resources.
A high IO> value (above 80%) means the process is IO-limited, which means this application is spending a lot of time waiting to read the data from the disk. Never a good thing. Moreover, not only is this application going to be slow. This may also slow down other applications that are utilizing the same disk.
Detecting Performance Issues Using iotop
Symptom | Possible Cause |
Process with high IO> but low CPU usage | IO bottleneck slowing down the app |
mysqld consuming most disk reads/writes | Database queries might need optimization |
High disk writes from rsync or logrotate | Excessive logging or backups impacting performance |
nginx showing unexpected high reads | Serving large static files from disk instead of caching |
1. vmstat – System-Wide Performance Metrics
vmstat (Virtual Memory Statistics) is a versatile tool for monitoring overall system performance, including disk IO, CPU, memory, and processes. While it doesn’t provide per-process details like iotop, it offers a quick snapshot of your system’s health.
Basic Usage:
vmstat
Example Output:
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu----- r b swpd free buff cache si so bi bo in cs us sy id wa st 1 0 70400 137776 41652 369940 1 2 185 295 60 113 1 0 98 0 0
Relevant Columns:
- r and b (Run/Block Processes):
- r shows the number of processes waiting for CPU time.
- b shows the number of processes blocked, often due to disk IO.
- bi and bo (Block Input/Output):
- bi measures data read from the disk (blocks per second).
- bo measures data written to the disk (blocks per second).
- Consistently high bo values may indicate excessive write activity.
- wa (IO Wait):
- Percentage of CPU time spent waiting for IO operations to complete.
- High wa values (e.g., above 20%) suggest the system is IO-bound.
- id (CPU Idle):
- Percentage of CPU time spent idle. If id is low and wa is high, it’s a clear sign of IO bottlenecks. In plain English, this means that applications running on the host are waiting for the disk while not doing much.
Analyzing Performance with vmstat
Symptom | Possible Cause |
High b values | Processes are blocked, likely due to disk IO contention. |
High bo but low bi | Write-heavy workload, possibly from logs or backups. |
High wa | Disk IO bottleneck; the storage device may be too slow or overloaded. |
High bi and low bo | Read-heavy workload, common in database queries or file access. |
2. dstat – Customizable Performance Monitoring
dstat is a powerful and flexible tool that combines features from iostat, vmstat, and netstat, which is why it’s my tool of choice when I’m working in the terminal. It provides real-time statistics for disk IO, network activity, CPU, memory, and more in an easy-to-read format.
Installation:
sudo apt install dstat # Debian/Ubuntu sudo yum install dstat # RHEL/CentOS
Basic Usage:
dstat
Example Output:
--total-cpu-usage-- -dsk/total- -net/total- ---paging-- ---system-- usr sys idl wai stl| read writ| recv send| in out | int csw 1 0 99 0 0| 577k 923k| 0 0 |1408B 3786B| 101 188 0 0 100 0 0| 0 0 | 70B 246B| 0 0 | 57 83 0 0 100 0 0| 0 0 | 70B 134B| 0 0 | 41 58 0 0 100 0 0| 0 0 | 70B 110B| 0 0 | 42 69 0 0 100 0 0| 0 0 | 164B 208B| 0 0 | 51 76 0 0 100 0 0| 0 0 | 70B 118B| 0 0 | 37 59 0 0 100 0 0| 0 0 | 70B 110B| 0 0 | 39 69
Analyzing Disk IO Metrics with dstat
Metric | What It Tells You | Action to Take |
read/writ | Real-time read/write throughput | High values may indicate heavy IO load. |
util | Disk utilization percentage | Consistently above 80% may indicate a bottleneck. |
tps | Transactions per second | Low TPS but high utilization may suggest inefficient IO patterns |
3. sar – Historical Disk IO Monitoring
sar (System Activity Reporter) is ideal for capturing and analyzing historical performance data. So when you need to diagnose disk IO issues that occurred in the past or during specific time windows, this is the tool to use, not the other ones listed above. But note that, unlike the other tools above, which are only command-line tools, sar actually involves a service that runs all the time to collect data.
sar is part of the sysstat package and can record various system metrics, including disk IO, at regular intervals.
Installing and Enabling sar
To use sar, install the sysstat package:
sudo apt install sysstat # Debian/Ubuntu sudo yum install sysstat # RHEL/CentOS
Once installed, enable the sysstat service to start collecting data:
sudo systemctl enable --now sysstat
By default, sar collects system metrics every 10 minutes and stores them in /var/log/sysstat/. This interval can be adjusted in the configuration file located at /etc/sysstat/sysstat.
Basic Usage:
To view current disk IO metrics:
sar -d 1 5
- -d specifies disk activity.
- 1 5 collects data every 1 second for 5 iterations.
Example Output:
vagrant@vagrant:~$ sar -d 1 5 Linux 5.4.0-89-generic (vagrant) 01/28/25 _aarch64_ (2 CPU) 22:20:55 DEV tps rkB/s wkB/s dkB/s areq-sz aqu-sz await %util 22:20:56 dev7-0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 22:20:56 dev7-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 22:20:56 dev7-2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 22:20:56 dev7-3 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 22:20:56 dev7-4 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 22:20:56 dev7-5 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 22:20:56 dev7-6 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 22:20:56 dev7-7 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 22:20:56 dev259-0 2.00 0.00 32.00 0.00 16.00 0.00 0.50 0.40 22:20:56 dev253-0 8.00 0.00 32.00 0.00 4.00 0.00 0.00 0.40 22:20:56 dev7-8 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 22:20:56 dev7-9 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Key columns include:
- tps – Transactions per second.
- rd_sec/s & wr_sec/s – Sectors read and written per second.
- await – Average time (ms) for disk IO operations to complete.
- %util – Disk utilization percentage.
Setting Up Alerts for Disk IO Issues
Monitoring disk IO metrics manually or periodically is helpful, but in production environments, automation is key. Setting up alerts ensures that you’re notified the moment disk performance issues occur, allowing for proactive troubleshooting before users or applications are affected.
This section covers how to automate disk IO monitoring and configure alerts using scripts, system tools, and external monitoring solutions.
1. Using Shell Scripts for Custom Alerts
You can write a shell script to monitor key disk IO metrics (e.g., from iostat) and trigger alerts when thresholds are exceeded.
Example: Monitoring Disk Utilization
Here’s a basic script that checks if disk utilization (%util) exceeds 80%:
#!/bin/bash # Threshold for disk utilization THRESHOLD=80 # Check disk utilization using iostat UTIL=$(iostat -dx 1 2 | grep 'sda' | tail -1 | awk '{print $NF}') # Compare utilization with threshold if (( $(echo "$UTIL > $THRESHOLD" | bc -l) )); then echo "Disk utilization is high: ${UTIL}% on /dev/sda" | mail -s "Disk Alert" admin@example.com fi
Steps to Deploy
1. Save the script as disk_alert.sh and make it executable:
chmod +x disk_alert.sh
2. Schedule the script to run periodically using cron:
crontab -e
Add a line to run the script every 5 minutes:
*/5 * * * * /path/to/disk_alert.sh
2. Cloud Solutions for Disk IO Monitoring and Alerting
While one could use the above approach to set up alerting, I wouldn’t really recommend doing that in production to anyone. That’d be a seriously poor man’s approach to monitoring, an approach that would quickly drive anyone crazy. There are several cloud-based solutions that I would recommend using instead. Monitoring and alerting are their core functionality. Just for example, in Sematext you will see charts like these out of the box:
Note the I/O Read/Write chart. That’s the visual version of the read/write metrics we saw the above Linux tools output to the terminal. Of course, alerting is built into Sematext and you can easily set it up – note that little bell icon in the screenshot above. You can use that to set up anomaly detection to get alerted about unusual spikes or dips in read or write performance.
Yes, I’ve purposely generated a very “messy” chart with too many data series to show you that even in such situations you can pick out insights about strange or high disk IO. You can see here that some set of hosts perform a ton of disk writes every night between XXX and XXX. If my job is to run such infrastructure, I’ll want to know about this, I’ll want to dig into what is happening there, ensure there is enough disk IO capacity, and so on.
Here is just another example of a more distilled down view of disk IO performance:
If your infrastructure is hosted in the cloud, you can also use platform-native monitoring services with built-in alerting features. AWS comes with CloudWatch, Google Cloud has Google Cloud Operations Suit, and Azure has Azure Monitor.
Tuning Disk IO for Better Performance
Monitoring disk IO is only half the battle—optimizing and tuning disk performance ensures that your system runs efficiently. Below are several strategies to improve disk IO performance, reduce bottlenecks, and maximize throughput.
1. Optimize File System and Mount Options
Using the right file system and mount options can significantly improve performance.
- Use a modern file system:
- ext4 is optimized for general-purpose use.
- XFS is ideal for large-scale and high-performance workloads.
- btrfs provides advanced features like snapshotting and data integrity.
- Enable write-back caching:
mount -o remount,noatime,commit=60 /dev/sda1 /mnt
- noatime: Prevents unnecessary metadata writes when files are accessed.
- commit=60: Reduces the frequency of metadata commits to disk (default is 5s).
- Tune journal settings (for ext4 and XFS):
tune2fs -o journal_data_writeback /dev/sda1
2. Adjust IO Scheduler for Workload-Specific Optimization
Linux offers different IO schedulers that impact how disk requests are handled. Choosing the right one depends on your workload.
- Check current scheduler:
cat /sys/block/sda/queue/scheduler
- Change scheduler (temporary):
echo "none" > /sys/block/sda/queue/scheduler
- Make it persistent (GRUB method):
Edit /etc/default/grub and modify the kernel parameters:
GRUB_CMDLINE_LINUX_DEFAULT="elevator=none"
Run:
sudo update-grub sudo reboot
Scheduler choices:
- none: Best for SSDs and NVMe drives.
- mq-deadline: Good for databases and mixed workloads.
- bfq: Ideal for desktop users to ensure responsive performance.
3. Increase Read/Write Buffers
Tuning kernel parameters can help improve disk throughput, especially in write-heavy workloads.
Adjust disk readahead (improves sequential reads):
blockdev --setra 4096 /dev/sda
Verify with:
blockdev --getra /dev/sda
Increase dirty writeback timers (delays syncing dirty pages to disk):
sysctl -w vm.dirty_ratio=40 sysctl -w vm.dirty_background_ratio=10
- vm.dirty_ratio=40: Allows up to 40% of RAM to be used for dirty pages before forcing a flush to disk.
- vm.dirty_background_ratio=10: When dirty pages exceed 10% of RAM, the background flush daemon starts writing data.
Enable TRIM for SSDs (Improves Performance and Longevity)
If using SSDs, enabling TRIM ensures efficient space reclamation.
- Check if TRIM is supported:
lsblk --discard
- Enable TRIM manually:
fstrim -av
- Reduce Swap Usage
Excessive swap usage can degrade disk IO performance. If you have enough RAM, consider reducing swappiness.
- Check current swappiness:
cat /proc/sys/vm/swappiness
- Reduce swappiness (recommended for servers):
sysctl -w vm.swappiness=10
- Monitor and Reduce Unnecessary IO
Identify processes generating excessive IO and optimize them.
- Find IO-intensive processes:
iotop -o
- Limit IO usage per process (ionice):
ionice -c3 -p <PID>
Conclusion
Like maxed-out CPU, maxed-out disk reads or writes can really degrade your users’ experience with your product and negatively impact revenue. And who wants that!? If you are running a small operation, the Linux tools we covered here – iostat, iotop, vmstat, dstat, and sar – will help you keep an eye on your disk utilization.
If you are running a proper production system, use a proper monitoring solution, be it Sematext or Datadog, or something else.