Server uptime monitoring is critical for ensuring the reliability and availability of your infrastructure and services. By keeping track of server uptime, you may be able to identify and address potential issues before they impact your end-users.
Why just “may be able to”? Because “it depends”.
It depends on whether your infrastructure/applications/deployments are built with redundancy in mind. Even if you have a redundant setup, it depends whether it actually works.
Defining Server Uptime and Website Uptime
Server uptime refers to the total time a server has been operational and accessible without interruptions. It’s a measure of the server’s reliability and stability. In contrast, website uptime denotes the period during which a website is available and functional for users.
While these concepts are related, they aren’t identical. For example, a server (or a VM) can be running and have high server uptime even if the application or website it hosts is down due to application errors, database issues, or other problems.
Conversely, a website might be up and running on a backup server even if the primary server experiences downtime. Therefore, monitoring both server and website uptime is essential for a comprehensive understanding of system health.
Here’s a table defining Reliability, Availability, and Uptime in the context of server monitoring:
Term | Definition |
---|---|
Reliability | The ability of a server to operate continuously without failure over a given period. A highly reliable server experiences minimal unexpected failures. |
Availability | The percentage of time a server is operational and accessible to users. It is calculated as (Uptime / (Uptime + Downtime)) × 100%. High availability systems aim for minimal downtime. |
Uptime | The total time a server has been running without interruption. It is often expressed in percentages (e.g., "Five Nines" or 99.999% uptime) and is a key metric in Service Level Agreements (SLAs). |
Checking Linux Server Uptime
On Linux systems, you can determine the server’s uptime using the uptime command in the terminal:
$ uptime 14:33:11 up 10 days, 2:33, 1 user, load average: 0.00, 0.01, 0.05
This output indicates the current time (14:33:11), how long the system has been running (10 days, 2:33), the number of users logged in (1 user), and the system’s load averages over the past 1, 5, and 15 minutes (0.00, 0.01, 0.05).
Checking Windows Server Uptime
On Windows systems, several methods can determine server uptime:
Using Task Manager
- Press Ctrl + Shift + Esc to open Task Manager.
- Navigate to the “Performance” tab.
- Under the “CPU” section, find the “Uptime” value displayed in the format D:HH:MM:SS.
Using Command Prompt with systeminfo
- Open Command Prompt.
- Execute:
systeminfo | findstr /C:"System Boot Time"
- This command retrieves the system’s boot time, from which uptime can be calculated.
Using Command Prompt with NET STATISTICS:
- Open Command Prompt.
- Execute:
net statistics server | findstr /C:"Statistics since"
- This command displays the date and time since the server statistics have been collected, indicating the system’s uptime.
Sending Uptime Notifications via Email
While this is certainly not suitable for a serious production setup, just for illustration purposes, let’s write a simple script to receive email notifications of your Linux server uptime. Here’s an example script that captures the uptime command’s output and sends it via email using mail (ensure you have a mail transfer agent like sendmail or postfix configured):
#!/bin/bash # Capture uptime UPTIME_INFO=$(uptime) # Email details TO_ADDRESS="admin@example.com" SUBJECT="Server Uptime Report for $(hostname)" BODY="Current uptime: $UPTIME_INFO" # Send email echo "$BODY" | mail -s "$SUBJECT" "$TO_ADDRESS"
Save this script (e.g., as send_uptime_email.sh), make it executable (chmod +x send_uptime_email.sh), and schedule it using cron to run at desired intervals.
Again, not something you would want for a production setup, but more on that further below.
Setting Up Server Startup and Shutdown Alerts
This actually is good to have set up on your production servers.
To be notified when your server (or a VM) starts or shuts down, you can add scripts to specific initialization and shutdown sequences. Since we make use of this on our own servers, let me show you how exactly we send events about servers starting or stopping to Sematext Cloud.
The following will work for Systemd-Based Systems (e.g., Ubuntu 16.04 and later) where we can create a systemd service unit that triggers on startup and shutdown. This is how you would do that:
1. Create a script (e.g., /usr/local/bin/notify.sh) that sends a notification:
#!/bin/bash EVENT_TYPE=$1 APP_TOKEN="YOUR_SEMATEXT_APP_TOKEN" HOSTNAME=$(hostname) TIMESTAMP=$(date -u +"%Y-%m-%dT%H:%M:%SZ") curl -X POST "https://event-receiver.sematext.com/YOUR_SEMATEXT_APP_TOKEN/event" \ -H "Content-Type: application/json" -d '{ "type": "'infra", "message": "'"$HOSTNAME is about to $EVENT_TYPE"'", "os.host": "'"$HOSTNAME'", "timestamp": "'"$TIMESTAMP"'" }'
Of course, the payload of the curl call will be slightly different if you are sending events elsewhere and not to Sematext.
2. Make the script executable:
chmod +x /usr/local/bin/notify.sh
3. Create a systemd service unit file (e.g., /etc/systemd/system/notify.service):
[Unit] Description=Notify Sematext of Server Events DefaultDependencies=no After=network.target [Service] Type=oneshot ExecStart=/usr/local/bin/notify.sh START ExecStop=/usr/local/bin/notify.sh STOP RemainAfterExit=true [Install] WantedBy=multi-user.target
4. Enable and start the service:
systemctl enable notify.service systemctl start notify.service
This setup ensures that Sematext receives events whenever the server starts or stops.
But this is not going to give you full server monitoring that you might be after. These are just (hopefully) rare events that inform you when a server was started or shut down.
However, before we move away from these events, note that events can be really handy for other things, too, like marking application builds or application deployments. We used the word “marking” here because that’s often how they are used – you can imagine how this could be very useful for “marking” deployments, for example, and then being able to correlate performance degradation or appearance of new errors to a specific deployment, right? Here’s how to send events to Sematext.
Monitoring Server Uptime with Agents and Heartbeat Alerts
While the above server start/shutdown notifications are really useful, the effective server monitoring typically involves the use of server monitoring agents. To make things less abstract and theoretical, let’s walk through how one would utilize Sematext Agent to monitor servers and server uptime. This, or something very similar to this, can be done with other server monitoring solutions.
1. Install the Agent
Of course, you will not be installing while reading this article, so this step is here just so that the sequence of steps is clear. The agent installation docs are here in case you want to see what this step entails, but generally it takes only minutes of copy-pasting a couple of commands.
2. Monitoring Server Metrics
Agents generally collect core infrastructure metrics – CPU, disk, network, memory, and such. Some, like Sematext agent, will also automatically monitor all your Kubernetes infrastructure, your processes, collect information about packages installed on your servers (good for finding vulnerable versions of packages), discover services, logs, etc.
3. Configure Heartbeat Monitoring
The agent periodically sends a “heartbeat”. This heartbeat means that the agent is sending a “hey I’m still alive and kicking” sort of message, and if the agent is alive and kicking, then we know that the server it’s running on is also alive. As long as this message is regularly being received by Sematext, we know that the server is up and running. When that message stops arriving then we know something’s up (err, down). That is where heartbeat alerts come in. Heartbeat alert rules are created in Sematext for you by default, so there is nothing extra you need to do, although you can customize a few things. For example, you can see some of these heartbeat alert rules in this screenshot below.
You can disable them if you don’t want them, and you can also make them more or less sensitive. That is, you can configure these heartbeat alert rules to alert you if the heartbeat has not been received in 2, 3, or some other number of minutes. You can learn more in Heartbeat Alerts docs, but this is what that looks like – note the “No data received by Sematext in the last 5 minutes” part in the screenshot below. That’s what we’re talking about here.
4. Configuring Alert Notifications
The last piece of the puzzle is the actual alert notifications. All server monitoring solutions offer integrations for alert notifications. Some call them channels, others call them hooks. Typically, you can choose between PagerDuty, Slack, and other such systems. At Sematext we use Slack where we’ve created a #heartbeat-alerts channel for this type of alerts. We found it useful to have heartbeat alerts in a channel separate from other types of alerts. Here is what a heartbeat alert notification looks like in Slack:
In this case, that logs-es-client02.eu.sematext.com went missing at 08:14. You know how at the beginning of this article we said something about redundancy? Well, this is a good example of that – the process or maybe even the whole server could have died here at 08:14, but we didn’t care, at least not enough to feel like we had to jump and fight a production fire. Why? Because we have redundancy, we knew the service was still functional, plus we have a system that “self heals” in cases like this.
What’s more, there is also a notion of the “server coming back up” notification, which gives us an extra piece of mind that this server did come back up and was back in business. For that, there is an optional “back to normal” notification, and in Slack it looks like this:
This notification is optional. You can disable it using the “Alert me when data starts coming to ….” toggle that you can see in the screenshot below:
5. Digging into Server Downtime
Just knowing about something going down is nice, but you also want to figure out what happened, why did the server or service go down? Sometimes information about that comes from the server or application logs. But in this case, the clue came from another type of event. As you can see above, alert alert notifications contain links. As you might imagine, these links let you drill in and start troubleshooting, starting from a screen like this:
Here, you can see our logs-es-client02.es-sematext.com disappearing. Btw. see anything incongruent on this screenshot? Note the time on the chart. 12:04. Why’s that? We saw the heartbeat alert was fired at 08:14. So both the hour and minutes are off. Why? The hours are off because in Slack I saw the time in my local timezone, while in our Sematext we use a different time zone. And why :04 and not :14? Because this particular heartbeat alert was configured to fire if heartbeat metrics have not been received for 10 minutes. So things went down at :04, but the notification was sent 10 minutes later at :14.
Now what happened to this server? From here we’ll dig into that host’s details by clicking on the host name showing at the bottom of the screen. On the next screen that displays various details about our logs-es-client02.es-sematext.com server we see the Events tab, and look what we found there:
The agent captures various types of Linux events, including OOM Killer. So in this case, the server didn’t actually shut down, but the service that it runs and that we are monitoring did. And while the service recovered and while redundancy didn’t get us out of bed, this is an example of an “early warning sign” that we may want to do something about this – this server may be lacking memory.
Conclusion
Monitoring server uptime and overall server health and performance falls in the server monitoring 101 bucket. While you might be thinking only about monitoring server uptime, that’s only one thin slice of monitoring. Everyone should – and can – have full-blown server monitoring. As I hope you saw in this article, it is super simple to set up and, depending on which monitoring solution you choose, can be very affordable. As far as costs are concerned, I know Sematext’s pricing starts at $2.8/month for a server, which I believe might be the most affordable on the market at the time of this writing.
This said, while you want to monitor your servers, including having information about them starting up, and especially getting shut down, servers – and any containers and pods running on them – are pets, but should not be treated as such. Ultimately we want to treat them as cattle and monitor the health, availability, and performance of services. But as you saw in the last step above, being aware of these low level signals is also very valuable. If we don’t address the memory issue, this low-level signal will blow up one day and take down the whole service.