PostgreSQL is a popular open-source, object-relational database. As with any other data storage solution, capturing metrics is crucial for making sure your database is reliable, available, and performing optimally. This will help you dig deeper into database performance problems, do performance tuning, optimize queries and indexes, and make partitioning decisions. But that’s not all. You’ll also be able to set up alerts and plan for failures or upgrades.
In this post, I’ll introduce you to the key PostgreSQL metrics you should track, as well as the best monitoring tools available for measuring them and optimizing database performance.
What Should You Monitor in PostgreSQL: Key Performance Metrics to Track
There are two main types of monitoring in PostgreSQL:
- System resource monitoring: Looks at the infrastructure usage your PostgreSQL runs on, exposing metrics like CPU and memory usage
- Database monitoring: Ensures your database is healthy and running optimally
In both cases, it’s essential to monitor the metrics I’ll discuss below and to look for abnormal behavior. Usually, if one of these metrics is affected, others may be affected too. Monitoring all these metrics will help you with root-cause analysis if abnormalities are detected.
System Resource Monitoring
A healthy operating system running on top of infrastructure is critical for maintaining the stability of your database. System-level monitoring detects resource usage spikes to give you an indication of the future state of your system so you can prevent incidents.
There is a wide range of metrics for monitoring PostgreSQL. Here are a few you should consider when monitoring your system resources.
You should monitor CPU usage at all times. When your database is running complex queries and doing batch updates, the CPU is hit the hardest. You need to identify when the CPU is approaching its limit and make sure you’re notified if something out of the ordinary is happening in your system. I suggest you start by identifying the typical CPU usage behavior and then setting up alerts if the CPU usage percentage increases to around 85% or more.
Of course, this number can change based on the workloads you run and the number of CPUs you have allocated to the database. If CPU usage hits 100%, your database performance is most likely degrading at an alarming pace. Sometimes, the VM (or whatever system your database is running on) will be unresponsive.
High CPU usage can also be the result of other processes running in your system that are unrelated to your PostgreSQL database, such as systemd processes, http servers, and more. Still, the information gathered from this metric should help you figure out where to start looking to fix the issue.
Memory and swap usage should be at the top of your list of PostgreSQL metrics to monitor. The reason is simple: If you run out of memory, your database can crash. This means that you should monitor and note your database’s average memory utilization and set up alerts for anomalies, like sudden spikes that reach roughly 85% of memory usage.
Note that there is a difference between used memory and cached memory. You don’t have to account for cached memory because it usually gets freed up when new applications need memory. So even if overall memory usage seems full, you should be covered if a significant chunk is cached/buffered.
High memory usage may be directly related to your database configuration, such as shared_buffers and work_mem, which determine how much memory PostgreSQL can use in your operating system.
When considering storage, monitoring disk latency should be one of the top priorities. A slow disk means a slow database, and this requires immediate action. Disk read/write latency and disk read/write throughput allow you to see if anything out of the ordinary is happening with your disks. Ideally, lower latency should result in higher throughput.
Disk storage builds up over time, so when your disks are completely filled, you’re out of luck. I recommend setting a warning alert at 85% disk usage and a critical alert at 90% disk usage. But, of course, these numbers can change according to your needs.
It’s a good idea to use Logical Volume Manager (LVM) instead of direct mounts, since this allows you to expand volumes on the Linux system quite easily. Just look at what happened to me when I was working with PostgreSQL. At one point, there were tablespaces mounted to different disks, and the PostgreSQL WAL location was mounted into the OS root file system. This was a clustered PostgreSQL. Unfortunately, there was a network failure, and the PostgreSQL clusters couldn’t keep the sync on. This caused the WAL logs to start piling up.
Can you imagine what would have happened if we hadn’t already set up proper alerts? Disaster! The whole system would’ve gone down. If you don’t want to find yourself in such a situation, set up alerts for all the volume mounts you have on your VMs and make a plan to increase those volumes when the alerts get fired.
Network is another metric that can affect your PostgreSQL database or the applications connected to it. A network failure can be disastrous in a replicating system, resulting in logs filling up your storage. This can even occur when there is high latency between your database servers, if the database is running in clustered mode. This could cause your database to crash with an out-of-space error.
If your application is experiencing network issues and receives an error that the database isn’t available, your network should be one of the first places to look. Bad network configurations and hardware failures can lead to network-level issues.
Now that I’ve covered operating-system monitoring, I’ll go over how to monitor PostgreSQL performance and optimize it by collecting database-level metrics.
As you probably know, databases can use a lot of system resources due to poor configuration or inefficient SQL queries. System resource monitoring gives you an indication of when to vertically scale and increase resources to improve database performance. But, more often than not, that’s not what you need. This is where certain database configurations and query optimizations come into play, helping you improve performance without investing in more system resources.
Here are the database-level metrics you should be monitoring.
Certain queries can really hinder database performance. That’s why you need to look for long-running or slow PostgreSQL queries, as well as understand how they are executed and how they affect database performance. Using PostgreSQL’s inbuilt statistic collector, you can extract important information, such as the number of rows fetched, usage of scan vs. index in the query execution, and buffer hits in each query you execute.
There are several metrics that enable database administrators to identify slow-performing queries by singling out those with the highest mean time to execute and maximum time taken to execute. These metrics include Nth latency percentile, number of times executed, maximum/minimum time taken for a given query, and the number of rows affected.
Queries can be slow for many reasons. The most common are bad PostgreSQL configurations, dead tuples, and badly constructed queries with missing indices. One typical way to analyze queries is to run the EXPLAIN command observing the query plan, which enables you to look at how indexes are used and see if you need to change anything in the query.
You can also enable logging with your PostgreSQL configuration. This will log the query execution time for each query in the form of a log, which you can either analyze manually or with a log analysis tool like Sematext or pgBadger (read on for more info).
Adding the PostgreSQL extension pg_stat_statements will help you identify long-running queries and the queries that take the most time to execute. It will also show you the number of times each query has been run, the maximum and minimum time taken, and the mean time and number of rows that were affected. You can easily query the stats table to get this information. To achieve the best performance possible, you can then dig deeper to work on query optimization, set up indexes, and so on.
At the end of the day, slow database queries can pose a problem for all associated systems. However, keep in mind that extensions like pg_stat_statements need to be enabled manually and will add additional overhead to your database query performance.
When applications are connected to databases, they open connections with the databases. These are called active sessions. Databases generally have a maximum number of connections configured by default. For PostgreSQL, it’s 100. You need to monitor these connections and change the parameter max_connections accordingly. Otherwise, your application may scale, but it will eventually fail, since the database will reject the connection request. To avoid this, I recommend creating two alerts: one for 80% of connections used and another to detect anomalies or sudden spikes in the active session count.
When running production systems, you often need to switch to clustering architectures that ensure high availability. These systems can be active-active or active-passive, and can help with availability and performance by distributing certain workloads among the replicated instances.
For example, in a replicated system, you can use a primary node (master) for data writes and passive nodes (slaves) for running analytical queries and read queries without affecting the performance of the primary node. If an active node fails for some reason, one of the passives can be elected as the master, ensuring database availability. You can easily query replication metrics by accessing the pg_stat_replication table in PostgreSQL. I recommend running PostgreSQL with high availability when you’re running production systems.
You can derive quite a few metrics from log monitoring and log analysis, such as the number of log lines containing warn or error messages. This can give you an early indication that there’s something wrong with the database, which should prompt your DBAs to further analyze the situation and take a look at the logs. This can help you respond to errors thrown by the database before they lead to more serious problems.
From PostgreSQL logs, you can extract metrics like connection_login, which provides information about each connection and disconnection, and log_min_duration_statement, which helps analyze slow queries.
The Best PostgreSQL Monitoring Tools
Now that I’ve discussed the key metrics you need to measure, let’s look at the best PostgreSQL monitoring tools currently available to help you do just that.
PostgreSQL is supported by plenty of monitoring solutions that cater to different requirements. Most of these solutions monitor system-level resources and database-level metrics, offering meaningful visualizations and alerting to help you monitor your database.
Below, I’ll review some PostgreSQL monitoring tools to help you better understand your options.
Sematext Monitoring is a monitoring tool with support for monitoring PostgreSQL databases. It focuses on logs, infrastructure (both on-premises and cloud), tracing, and performance monitoring—not only for PostgreSQL, but for many other relational and non-relational databases, including MySQL, MariaDB Server, Redis, MongoDB, Apache Cassandra, and Apache HBase.
Sematext features an easy-to-set-up PostgreSQL agent, which can be installed directly via the UI or manually via terminal or automation. Beyond that, you can configure Sematext to ship all of the PostgreSQL metrics you need to monitor to the Sematext control plane in order to set up visualizations, optimizations, insights, and alerts. One cool feature Sematext offers is a built-in Sematext PostgreSQL Log integration, which allows you to identify slow queries, errors, and warnings, giving you deeper insights when doing root-cause analysis.
- Database performance monitoring in real time
- Slow query analysis
- Log analysis
- Supports both system metrics and database performance metrics
- Correlation between metrics, enabling database administrators to drill down easily and identify problems faster
- Doesn’t support custom PostgreSQL metrics that are generated by extensions like pg_stats_statments or pgstattuple
- Limited support for transaction tracing
Sematext offers an affordable pricing model, where you’re charged per node/agent in your PostgreSQL cluster. There is also a 14-day free trial with no limitations. So you can explore its features and functionalities.
For a quick look at how Sematext Synthetic Monitoring works, check out the short video below:
2. Prometheus and Grafana
Prometheus is an open-source, cloud-native monitoring tool that enables you to monitor databases, VMs, and basically anything else. At the heart of Prometheus, there’s a time-series database that scrapes data from a bunch of exporters that you define. For system-level resource monitoring, the most popular exporter is node-exporter for Prometheus, which exports all of the resource-monitoring metrics you need to properly monitor your database servers.
With Prometheus Alertmanager, you can define alerts for metric thresholds. What’s more, you can use Grafana to set up dashboards with Prometheus and observe the patterns and behaviors of the metrics you collected. For PostgreSQL, you can also use a PostgreSQL exporter to export metrics such as active sessions, database locks, and replication.
- Automatic service discovery
- Widely available community support
- Supports both system metrics and database performance metrics
- Flexibility to configure custom metrics that are not supported by default in PostgreSQL exporter
- Complex and time consuming to manage Prometheus instances; operational overhead
- Need to manually configure and maintain Prometheus exporters
- No log analysis features
Prometheus and Grafana are free and open source.
pganalyze is a PostgreSQL monitoring tool that enables you to run query optimization and analysis, easily monitor your running queries in real time, collect query plans, and optimize queries. It also gives you tuning recommendations for your PostgreSQL cluster.
Thanks to the comprehensive query performance data provided by pganalyze, you can rapidly identify the underlying cause of an issue, solve the problem, and check that the user deployed solution is functioning as anticipated. This is a great option if you need to do query optimization.
- Built specifically for Postgres
- Allows you to analyze query executions in detail
- Offers log analysis with pganalyze log insights
- Gives automatic performance and query optimization suggestions
- Doesn’t support system-level metrics
- Not a full-fledged monitoring tool (no support for application performance, log traces, etc.)
- No support for alerts
pganalyze is priced per database. The Production plan supports up to three database servers, and the Scale plan supports up to six database servers. For anything more than that, you’ll need the Enterprise plan (get a price quote from the sales team). There is also a 14-day free trial.
pgDash is another PostgreSQL monitoring software that comprehensively visualizes the gathered metrics and provides diagnostic information to help you debug and monitor the database state in real time. It also allows you to capture metrics related to both the system and database and to set up meaningful alerts when necessary.
pgDash’s basic alerting feature is great for setting up real-time alerts that are easy to interpret and use. It comes with a number of notification methods, including email, Slack, PagerDuty, VictorOps, xMatters, and Webhooks. When significant changes are made to your PostgreSQL databases, such as addition/deletion of users, tables, or indexes—or when sudden increases or reductions in table size are detected—pgDash will notify you through notifications from Change Alerts.
- Built specifically for Postgres
- Offers query analysis and optimization recommendations
- Can run as a self-hosted instance
- Has an alert-management feature
- Comparatively less intuitive user experience
- Supports only database-level metrics
- No correlation of metrics
pgDash is priced per database. It has three plans: Basic (supports up to two databases with basic alerting), Pro (supports up to five databases with advanced alerting features), and Enterprise (supports more than 20 databases with advanced insights and integrations). There is also a 14-day free trial.
5. Datadog APM
Datadog APM provides tracing support for PostgreSQL and monitors system-level metrics. You can also use it to monitor application performance and enable end-to-end tracing with your database queries and other service calls. It gives you code-level visibility, with an instant root-cause analysis feature.
Datadog APM is a great tool to integrate with your applications that open connections with your databases. It gives you end-to-end tracing capabilities that enable you to pinpoint which application/microservice is performing unoptimized queries. You can then fix them accordingly. Datadog also has an easily installed agent for PostgreSQL, which collects data from PostgreSQL and ships it to the Datadog control plane for visualizations.
- End-to-end application tracing support with query analysis
- Supports system-level and database-level metrics
- Log analysis
- Correlation of metrics, enabling users to drill down and identify problems quickly
- No self-hosted solution
- Advanced query optimization and recommendations not supported since Datadog APM is not specifically built for Postgres
- Limited reporting and analytics capabilities. If you need very sophisticated graphing procedures, you may need to use the data in another business intelligence tool.
Datadog APM offers flexible pricing plans, typically based on the number of agents/hosts deployed in your infrastructure. You need to pay for each feature separately. It’s free for 14 days, and you can monitor as many servers as you want.
6. AppOptics APM
Recently acquired by Solarwinds, AppOptics APM is another solution you can use for PostgreSQL monitoring. Similar to Datadog APM, it provides integrations for multiple databases with tracing ability, enabling you to identify and optimize applications and queries. With the AppOptics PostgreSQL plugin, you can easily look at both system-level and query-level metrics, network latency, and more—all with user-friendly dashboards and intuitive visualizations.
AppOptics APM is an easy-to-use solution with a small learning curve. It helps you quickly discover the underlying causes of performance problems and, at the same time, troubleshoot faster. That’s because your availability and performance statistics are collated in a single place.
- End-to-end application tracing support with database
- Slow query analysis
- Supports both system-level and database-level metrics
- Easy to set up and get started
- No self-hosted version
- Generic and limited reporting filters in the dashboard
- Identity federation with support for LDAP is not available
AppOptics APM is priced per host, per month. There is a 30-day free trial, so you can play around with the various features.
AppDynamics is a monitoring tool focused on application performance monitoring and infrastructure monitoring. It gathers plenty of metrics from your infrastructure and system and provides you with insights, dashboards, and recommendations to improve performance. AppDynamics offers PostgreSQL support so you can deep-dive into PostgreSQL system resource usage, drill into queries and observe the database instance objects they affect, and troubleshoot issues in real time.
Monitoring capabilities are mostly out of the box, including, but not limited to, alerts. One cool feature is application flow mapping, where AppDynamics learns your application’s behavior and establishes baselines automatically, leading to smart alerts on anomaly detection.
- Correlation of application performance metrics with tracking and database metrics
- Supports both system-level and database-level metrics
- Query performance optimization suggestions and drill-down reports
- AI-powered anomaly detection and alerts
- Self-hosted version not available
- Not a lot of tutorials and documentation
- Complex UI; need training to use product
AppDynamics pricing model charges you per CPU core. There is a 15-day free trial.
8. ManageEngine Applications Manager
ManageEngine Applications Manager is an application performance monitoring system with PostgreSQL support. It can monitor all required system-level metrics and, on the database level, captures query performance, active sessions, buffer cache, and more.
Applications Manager has an AI-assisted smart alert feature, which can detect anomalies in your metrics right away. It also automatically discovers database calls from your application and analyzes the performance trends of your database settings by generating attribute-by-attribute performance reports.
- Supports end-to-end application tracing with database calls
- Supports slow query analysis with SQL drill-down reports
- Supports both database-level and system-level metrics
- Easy to use
- No SaaS version
- No log analysis feature
- General solution for application performance management that doesn’t specifically target PostgreSQL
Pricing is based on the number of agents you deploy and the number of users who will use the system. You have to get an exact price quote from the sales team. The Free tier supports up to five applications or servers for monitoring.
ClusterControl by Severalnines enables you to monitor PostgreSQL, as well as set up, deploy, manage, and replicate PostgreSQL instances with a centralized control plane. Its sophisticated automation features make it simple to set up PostgreSQL streaming replication, so you can add or promote replication slaves from scratch or from pre-configured ones.
ClusterControl also helps you set up alerts and backups, run routine patches, scale your database, and manage and optimize database-level performance. If you are managing your own PostgreSQL servers on-premises, ClusterControl is an excellent choice.
- Enables you to create and manage PostgreSQL clusters via a simple Ul
- Database-level and system-level metric support
- Allows you to instantly deploy a battle-tested, production-ready database with a few clicks
- Query performance analysis
- Cluster management tool, not a monitoring tool
- No log analysis features
- No correlation of metrics for a better debugging experience
ClusterControl offers a community edition with limited features, but to take advantage of its full capabilities, you need to purchase the Enterprise plan.
The final tool on the list is Nagios, an open-source network and infrastructure monitoring solution. With Nagios, you can monitor system-level performance and then use a plugin for PostgreSQL to export PostgreSQL-level metrics to the platform and set up alerts. Using PostgreSQL extensions like pg_stat_statements and pgstattuple, it’s easy to run a few scripts to extract metrics out of PostgreSQL, push to Nagios, then set up alerts according to your needs.
In addition to PostgreSQL, Nagios constantly monitors key system resources and can monitor other popular network protocols, such as SMTP, POP3, SSH, and HTTP. Nagios can also provide passive controls, which are implemented in external apps that are linked to it.
- Supports system-level metrics
- Option to run entire system in self-hosted mode in a private network
- SSO support with LDAP and SAML support
- Third-party integrations via open API
- General monitoring solution, not specifically built for PostgreSQL
- No query analysis or optimization recommendations
- Considerable management and maintenance overhead
Nagios Core is free and open source. Nagios XI, the enterprise version, is built on top of the open-source version. It offers additional features and less management overhead. Nagios XI is free for monitoring small environments. Once you exceed seven monitoring nodes, you have to purchase an annual license that includes support and Nagios server maintenance.
What Tool Should You Choose to Monitor PostgreSQL?
To optimize database performance and ensure that your database is healthy, it’s crucial to monitor system-level and database-level metrics. In theory, this all sounds good, but when it comes to implementation, collecting the metrics and setting up meaningful visualizations and alerts can be tricky. That’s where the tools I reviewed in this article come into play. If you’re going with an open-source solution, you’ll need to do a lot on your own. Paid solutions, on the other hand, really take some of the headaches out of setting up and managing monitoring and alerting.
Here’s my final recommendation. If you’re looking for open-source PostgreSQL monitoring tools, go with Prometheus and its exporters. But, if you need a solution that allows you to monitor your entire infrastructure and the systems running on it—including PostgreSQL—choose a comprehensive monitoring solution like Sematext Monitoring, which provides monitoring support for network, cloud infrastructure, logs, databases, and application performance.
To see what Sematext can offer you, check out our free trial.
You might also be interested in:
Nilesh is an accomplished Software Architect with over eight years of industry experience. Skilled in developing full-stack applications, cloud computing, DevOps, and SRE. Expert in all things related to Kubernetes and the Cloud Native stack. He is passionate about assisting the community by sharing information via media articles and speaking at community events. Nilesh is now working as an architect on the development of an iPaaS solution based on Kubernetes.