HAProxy is one of the most popular software around when it comes to load balancers and reverse proxies. When you’re using it for these purposes, it’s especially important to monitor for both availability and performance, which will impact your SLI and SLOs.
In this post, we’ll talk about the main HAProxy metrics you should monitor and the best monitoring tools you can use to measure them.
What Is HAProxy?
HAProxy is a high-availability server used for load balancing and as a proxy for TCP and HTTP applications. It can do TLS offloading, header-based and route-based routing, and health checks for backends. It supports protocols like WebSocket, gRPC, HTTP and more.
HAProxy is also one of the fastest proxies out there, due to its event-driven architecture. It can quickly perform actions to I/O operations, and its symmetric multiprocessing and lightweight multithreading provide a task scheduler that focuses on high throughput and low latency. It also has support for ACL, giving you control over who can access what resources, based on the requester’s IP, URL parameters, headers, and more.
When you’re working in reverse-proxy mode, HAProxy offers a lot of free features that other well-known proxies, such as NGINX, don’t offer, including TCP and TCP SSL.
Enabling the HAProxy Status Page to Collect Performance Metrics
In its default configuration, HAProxy doesn’t expose metrics. To start monitoring HAProxy, you first have to enable access to HAProxy metrics so that they can be read by other applications. There are two ways to do this: the HTTP interface and UNIX socket interface.
Enabling the HTTP Interface
To enable the HAProxy stats page, add the below snippet to the end of your HAProxy config file, which you can generally find at /etc/haproxy.cfg.
listen stats # Define a listen section called "stats" bind :9000 # Listen on port 9000 mode http stats enable # Enable stats page stats hide-version # Hide HAProxy version stats realm Haproxy\ Statistics # Title text for popup window stats uri /haproxy_stats # Stats URI stats auth Username:Password # Authentication credentials
In the last line of the code above, you’ll see that you can set your username and password to access the page. This setting requires you to restart the HAProxy service. After that, you will be able to access your stats page at http://haproxy_dns_or_ip:9000/haproxy_stats.
Enabling the Unix Socket Interface
Enable the UNIX socket interface by adding the below code in haproxy.cfg.
global stats socket /run/haproxy/haproxy.sock mode 660 level admin
With this socket, you can access the HAProxy stats using UNIX sockets. But first, you need to understand how socket communication works. Try connecting to this HAProxy UNIX socket with Netcat.
You can use HTTP mode when extracting metrics over the network and socket mode if your agent is running in a local system.
Important HAProxy Metrics to Monitor
HAProxy can do health checks for your backend endpoints and expose the metrics for them. You can enable these metrics by adding an http-check option in your backend config. It’s important to monitor frontend, backend, and system metrics.
Frontend Monitoring Metrics
You can use frontend metrics to track the connection from the client to the HAProxy. An issue with frontend metrics means your application has a problem connecting to HAProxy.
Make sure to monitor the following:
Request Rate [req_rate]
The frontend request rate represents the rate at which requests are coming in. It indicates sudden traffic increases or decreases, which can help you make scaling decisions.
Thresholds: If you see an increase in the request rate, you may have to upscale your HAProxy cluster, as more people may be trying to access your applications.
Session Rate [rate]
This is the rate at which HAProxy creates the connections between the frontend and the client. One session can be mapped to a unique client. These are TCP sessions created by each client, over which multiple HTTP requests can be sent.
Thresholds: A sudden increase in session rate can very quickly put a heavy load on your system and bring it down. An increase in request rate (req_rate) with no increase in session rate could mean that there are a lot of retries.
Error Requests [ereq]
This indicates the number of errors in HTTP requests. It can be caused by many factors, including the client terminating the connection before sending the request, a client timeout, a read error from the client, and a request that was sent to a honeypot server in case of an attack.
Thresholds: An increase in HAProxy error requests means that the clients cannot connect to your servers. You may have to check for client timeouts and resource usage in your HAProxy cluster.
Denied Requests [dreq]
This refers to requests that are denied because of permissions issues from the client to the HAProxy, due to wrong ACLs.
Thresholds: If you’re getting a lot of denied requests, either your client has the wrong credentials or someone is trying to guess the credentials with brute force or another method.
HTTPS Response Codes 4xx and 5xx [hrsp_4xx, hrsp_5xx]
These are the HTTP status code responses to the client, meaning that HAProxy is returning a lot of 4xx or 5xx codes to the client.
Thresholds: Any abnormal increase in these values indicates a problem with the application. A 4xx code is a sign that the client is sending something wrong, while a 5xx code shows that the backend server is crashing while serving the request. A lot of 408 codes means that there is a request timeout.
Network I/O [bin, bout]
These are network-in and network-out metrics that indicate traffic volume, which you must track to predict how much traffic you will get at what time.
Thresholds: Any sudden increase in network IO can be dangerous and choke your network. Any decrease in network I/O means total traffic has decreased or the nature of traffic has changed. A historical trend can help you upscale and downscale your servers.
Backend Monitoring Metrics
Backend metrics are used to track the connection from your HAProxy to the backend servers. They give you information that will help you check if HAProxy can talk to the backend properly and that responses from backends are up to the performance mark.
Here are some important HAProxy backend metrics to monitor:
Response Time [rtime]
This is the average backend response time for the last 1,024 requests. A higher rtime means that your backend is taking more time than it should to respond, indicating that there may be a performance issue or degradation.
Thresholds: A response time greater than 200–500ms indicates a poorly performing server. Your customer will feel the lag if your backend calls are taking 500ms. Note that the rtime metrics will remain zero if you are not using HTTP.
Error Connections [econ]
This points to a backend connection error, meaning that HAProxy cannot connect to the backend services and your service could be down. This metric also includes backend errors, such as a backend without an active connection. To identify the cause of an increase in backend connection errors, correlate this metric with eresp and response codes from your backend and frontend.
Thresholds: If your monitoring tools alert you to a sudden increase in econ, check for connection issues between your HAProxy and backend server. This number should be minimal.
Denied responses [dresp]
Denied responses are permission errors from the backend to HAProxy when HAProxy tries to connect to the backend. In most cases, the permission errors will come between the frontend and HAProxy.
Thresholds: An abnormal number of denied requests means that there are credential problems in your HAProxy. Backend responses that are restricted because of an ACL will return HTTP code 502. If your application ACL is configured properly, these values should be equal to zero.
Error Response [eresp]
This is the error response generated by your backend. Error responses consist of data transfer errors, ACL errors, socket connection errors, and more. This metric helps you understand why you are seeing backend errors. Combining this with the backend error response rate helps identify the exact problem.
Thresholds: Any increase in these numbers means that errors have increased and you need to check your backends. If the error response is high and there is an increase in denied responses, the errors are at the ACL and your client is not configured properly.
Average Time Spent in Queue [qtime]
This is the average response time of the last 1,024 requests.
Thresholds: The qtime should be as low as possible for better performance. A higher qtime means that your requests are taking longer to process. Watch out for any gradual or sudden increases and take scale-up actions on such events.
Number of Requests in Queue [qcur]
As the name suggests, this is the number of requests that are waiting in the queue. When the maximum connection limit is reached, HAProxy will start queuing the request in the kernel socket queue. The requests will stay there until the queue timeout is reached.
Thresholds: Poor qcur values mean that your HAProxy is reaching its saturation for handling requests and your backend needs to start serving requests faster. A high qcur can result in a high qtime. This is because the higher the number of requests in the queue, the more time each request will have to spend there.
It’s also important to monitor your system resources, as poor management can induce errors or latencies in the HAProxy servers.
There are several system metrics you should monitor, including:
This is the average load that your system is running on. Put simply, it’s the number of processes waiting for CPU time. Load average has 1-minute, 5-minute, and 15-minute averages.
Thresholds: A load average in the range of 1.5–2 times the number of CPUs is fine. If the value exceeds that limit, there are a lot of processes that are not getting CPU time. If your CPU is under load, you may need to scale your machine vertically or add more HAProxy servers to your cluster.
Memory is the amount of RAM used by your application or servers.
Thresholds: High memory usage impacts your system performance by forcing the system to swap or throw an out-of-memory (OOM) error.
This is the rate at which the data is transferred and received in your system’s network stack.
Thresholds: Network I/O should never reach the limit of the network bandwidth your machine supports.
This is the amount of disk consumption in your system.
Thresholds: If storage reaches 100%, a lot of processes running in your system will stop. You should set up an alert to notify you when storage reaches around 90%.
Top HAProxy Monitoring Tools
Now that we’ve identified the most important HAProxy metrics to measure, let’s take a look at the best HAProxy monitoring tools you can use to track them.
1. Sematext Monitoring
Sematext Monitoring is a SaaS solution that offers a suite of comprehensive HAProxy monitoring tools to ensure optimum system performance. It provides HAProxy frontend, backend, and health check metrics, including request rate, errors, denied requests, and HTTP error codes.
Sematext’s anomaly-detection feature enables you to figure out the cause of any sudden change in HAProxy metrics. It offers integration with containerized environments and a number of notification channels, including e-mail, Slack, PagerDuty, and more. The tool allows you to correlate HAProxy statistics with other components of your infrastructure, including servers and databases, giving you a holistic view of your IT environment.
- Out-of-the-box, pre-built dashboard for HAProxy
- Auto-discovery of HAProxy
- Support for HAProxy logs and metrics correlation
- Support for Kubernetes on AWS, Azure, and Amazon ECS
- No self-hosted solution
- No annual pricing model, although you can create a bundle to save money
The standard rate for monitoring is $3.60 per agent, per host, per month. The Pro tier costs $5.76 per agent, per host, per month. There is also a 14-day free trial, and you can use five hosts with 30 minutes of retention for free.
2. Prometheus and Grafana
Prometheus is a time-series database that is often coupled with Grafana, a visualization tool, to create a formidable open-source monitoring solution for various systems, including HAProxy. Prometheus’ major advantage is the ecosystem and tools around it. There is also a huge Prometheus community, and most of the tools export metrics supported by Prometheus, HAproxy included.
Prometheus gives you HAProxy metrics for frontend, backend, and health checks, while Grafana has a pre-built dashboard that you can use to plot them. While there are exporters for HAProxy that can expose metrics, HAProxy also has native support for Prometheus metrics. With Prometheus, you can extract metrics for any infrastructure component that has support for an exporter or write an exporter yourself.
- Powerful Prometheus Query Language; can customize alerts on these queries with Alert Manager
- Pre-built and custom dashboards available in the open-source community
- Service discovery mechanism to identify HAProxy boxes
- Difficult to manage multiple components
- No support for HAProxy log correlation, anomaly detection, etc.
Prometheus + Grafana is free and open source, but you do have to pay for the machines that you will run them on, as well as learning, implementation, and maintenance costs.
Dynatrace is an infrastructure monitoring software that includes an integration plugin for monitoring the health and performance of different tools, including HAProxy. Dynatrace can monitor your infrastructure, network, and processes. It also has a mobile app, which is unique among monitoring tools and great for visibility, even if you are on the move.
Dynatrace supports all HAProxy metrics, including request rate, error connection, and HTTPS status code. It features HAProxy anomaly detection, which will alert you if an unexpected event occurs, as well as dashboards for viewing the metrics and alerts. It can send alerts to e-mail, Slack, PagerDuty, etc. and has support for containers. You can also host Dynatrace on premises if you don’t want a hosted solution.
- Supports HTTP, as well as Socket Mode for metrics collection
- Pre-built dashboard for HAProxy metrics
- Auto-discovery for HAProxy boxes
- Supports containers, but lacks documentation for Kubernetes-based deployments
- No documentation around correlation of metrics and logs for HAProxy
Infrastructure monitoring costs $21 per month, per host for 8 GB of data. Full-stack monitoring costs $69 per month, per host for 8 GB of data. There is also a 15-day free trial, as well as a per-host pricing model.
Want to see how Sematext stacks up? Check out our page on Sematext vs Dynatrace.
Datadog is a SaaS-based application and infrastructure monitoring solution that can extract metrics from an HAProxy Prometheus endpoint, HTTP interface, or socket interface. It can store, visualize, and correlate various HAProxy metrics, such as request rates, error connections, status code, denied requests, and HAproxy logs.
Datadog supports monitoring HAProxy deployed on Kubernetes, Amazon ECS, physical hosts, or Docker environments. You will get alerts based on custom metrics and can forward the alerts to e-mail, Slack, PagerDuty, and other platforms.
- Auto-discovery of HAProxy boxes
- HAProxy metrics and logs correlation
- Bird’s-eye view of all HAProxy boxes in one place
- Comparatively expensive
- Multiple steps in integration with HAProxy (you must first make the changes in the agent and then enable it on the dashboards)
The Pro tier costs $15 per host, per month. Datadog also offers a 14-day free trial.
Want to see how Sematext stacks up? Check out our page on Sematext vs Datadog.
Zabbix is an open-source network and infrastructure monitoring tool with support for HAProxy monitoring. It is very customizable, allowing you to write your own scripting to perform the monitoring tasks in a more detailed and granular way. The solution enables you to measure HAProxy metrics emitted using HTTP, as well as socket interface. All metrics can be consumed and used to plot the dashboards.
Because Zabbix is an open-source solution, you have to run and manage it yourself. You may see the same limitations that you do with Prometheus and Grafana when running it at high scale, and you’ll need to decide which dashboard to use, as there are many options.
- Support for trend prediction, which helps you decide when to scale your infrastructure
- Third-party templates and dashboards for HAProxy
- Many options to customize using scripts
- No native support for containerized environments (Kubernetes, Amazon ECS, etc.)
- No logs and metrics correlation
Zabbix is free, but you do need time to learn, deploy, and maintain it. You’ll also need to pay for the machines that you will deploy it on.
ManageEngine is an enterprise tool for managing your infrastructure. It offers a wide variety of services, including support for monitoring HAProxy performance metrics and ticket systems to track issues. It can also facilitate automatic patch management. You can use ManageEngine to track and correlate backend and frontend HAProxy metrics, such as econ, request rate, and HTTPS status codes.
ManageEngine allows you to set alerts for anomalies and send notifications to e-mail and many other channels. For a bird’s-eye view of your infrastructure, you can aggregate the health metrics.
- Supports customizable dashboards
- Hotspot detection
- Great integration support with REST APIs
- No support for metrics correlation with logs
- Offers support for Kubernetes, but not for HAProxy monitoring as particular for Kubernetes if used as an ingress controller
ManageEngine costs $395 per year for ten service monitors. The enterprise version starts at $9,595 per year for 250 monitors and increases as you add more monitors.
Site24x7 is a SaaS tool with advanced capabilities for HAProxy metrics monitoring using Socket Mode or HTTP mode. It also offers helpful features like web-page diagnostics, SSL/TLS certificate monitoring, and FTP, POP, SMTP, and IMAP monitoring. You can use Site 24×7 to extract and plot a dashboard for metrics including request rate, average response time, HTTP status code, and more.
Site24x7 can put alerts on different metrics and send these notifications to channels like e-mail, PagerDuty, and so on. You can also use it to monitor your network and other infrastructure components.
- Pre-built dashboard and support for custom dashboards
- Auto-discovery of HAProxy boxes
- Support for HAProxy log management and monitoring
- Longest trial period, compared to other tools in this list
- No documentation on support for containerized environments like Kubernetes or Amazon ECS
- Need to install Python for agents to work
Site24x7 costs $10 for ten servers. You can add more servers to your package in multiples of 10, 50, and 500. There is also a 30-day free trial.
Want to see how Sematext stacks up? Check out our page on Sematext vs Site24x7.
Which HAProxy Monitoring Solution Is Best for You?
All of the solutions discussed in this article support almost all of the metrics provided by HAProxy. The differentiators are factors like cost, discovery mechanism, and support for containerized environments. When choosing a HAProxy monitoring tool, one of the main features to look for is log correlation with metrics. This gives you the power to troubleshoot issues and come up with solutions faster.
If you’re looking for an open-source solution, Prometheus + Grafana is a great choice. However, both Prometheus + Grafana and Zabbix lack features like anomaly detection, as well as logs and metrics correlation. Datadog and Dynatrace offer most features, but are very costly enterprise solutions.
If you want an all-around solution to monitor your HAProxy infrastructure and your applications, Sematext Monitoring works best. You can easily integrate with Sematext Logs to get advanced HAProxy monitoring and logging capabilities at a competitive price, and to stay ahead of issues to provide a great user experience. Start your free 14-day trial today.
Gaurav has been involved with systems and infrastructure for almost 6 years now. He has expertise in designing underlying infrastructure and observability for large-scale software. He has worked on Docker, Kubernetes, Prometheus, Mesos, Marathon, Redis, Chef, and many more infrastructure tools. He is currently working on Kubernetes operators for running and monitoring stateful services on Kubernetes. He also likes to write about and guide people in DevOps and SRE space through his initiatives Learnsteps and Letusdevops.