A large amount of data requires special tools. Apache Cassandra is one of those databases that can handle a large amount of data spread among many commodity servers, providing high availability and fault tolerance without a single point of failure. Developed under the umbrella of Apache Software Foundation, it ensures full visibility into the code base and being free of charge. With the possibility of spreading the data across multiple data centers and asynchronous, masterless replication we get a workhorse that will handle whatever we throw at it.
However, handling a large amount of data across different data centers spread across different geographical locations requires good monitoring. Monitoring that will provide the holistic visibility into the cluster, the read and writes, companions, and caches. Monitoring, in which you won’t have to use multiple tools to get Cassandra metrics, operating system metrics, and JVM metrics. One that ensures the ease of shipping not only metrics themselves but also logs. Finally, one that allows you to take action on the data, providing real-time alerting and notification mechanisms.
Keeping the above in mind, let’s have a look at the critical Cassandra metrics you should measure and the best monitoring tools available.
Key Cassandra Performance Metrics
The most complex, distributed systems provide a set of metrics that you should take care of, monitor, and alert on to ensure that your system is healthy and working well. Apache Cassandra is no different. It provides a plethora of metrics which we can divide into three categories:
- Dedicated Apache Cassandra metrics that describe how the system and its parts perform.
- Java Virtual Machine metrics that tell you about the execution environment on which Apache Cassandra is running.
- Operating system metrics describing the metrics related to the bare metal servers, virtual machines, or containers, depending on the environment that you are using.
Dedicated Cassandra Metrics
The first thing that you should take care of for good Apache Cassandra monitoring are the metrics that the distributed data store is exposing via the JMX interface. There are a lot of metrics exposed in the JMX and having visibility into most of them is a good idea. You never know what can be useful when troubleshooting. Still, let’s look at some of the key Cassandra performance metrics.
One of the most important metrics is the number of nodes that are currently available and connected to form a cluster. The ability to store the data and respond to queries is directly related to the availability of nodes.
Compaction is the operation of merging multiple smaller instances of SSTable into one bigger SSTable. Because of that, it can be very expensive and resource consuming. Having visibility into compaction performance is critical for long term observability – the Cassandra monitoring solution of your choice really needs to provide the number of compactions and the number of compacted bytes.
Read and Write
The next set of metrics is dedicated to clients and the read and write side of the operations. The number of reads that are happening in a given period, the request latency, and the number of timeouts and failures should be measured. Your observability solution should be able to provide not only the top-level view but also allow for slicing and dicing through the data showing you the aggregated view, per node view, per keyspace view, and per table view. The same goes for write operations. You should be able to see the number of write requests happening and write latency. Local writes and reads may also be important when troubleshooting. Keep those metrics in mind when choosing your monitoring solution.
Table metrics are also very important. Partition size is crucial when it comes to performance. These metrics are used as a unit of data storage, replication, and retrieval thus directly dictating the performance of your Cassandra tables. When it comes to table metrics, tombstone related metrics are also important. Tombstones are produced when you delete the data, because of the immutability of the SSTable and can be physically removed only during compactions. Because of that, you should keep an eye on how they affect your disk space.
As we mentioned earlier, other Apache Cassandra metrics can be helpful and you should consider monitoring them. Those include, but are not limited to:
- Caches – key cache and row cache requests and the hit ratio
- Bloom filter space used and false positives ratio
- Threadpool metrics
Java Virtual Machine Metrics
Apache Cassandra is a JVM based application which comes with all the usual JVM pros and cons. From the developer perspective, memory management is easier and requires less hassle – you just use an object and forget about it letting the JVM do the cleaning up. But that means that something has to clean up all the unused objects in memory. This is where the Java GC comes in and the metrics that come with it. The proper Cassandra monitoring tool should provide metrics allowing to check and troubleshoot issues with the JVM, such as JVM memory utilization and garbage collection count and time.
Operating System Metrics
Cassandra is architectured and developed to write a lot of data to the storage in a very short time. You need to keep an eye on the disk usage. Also account for compactions. Compaction is a process of merging multiple instances of SSTable and creating a single one, that is larger and contains all the data from the smaller tables. Keep in mind that until the compaction process ends the total disk space used maybe double that before the compaction. Because of that, you should consider leaving about 50% of space free to account for compactions and of course set up appropriate alerts.
The second operating system metric that you should keep an eye on is CPU utilization. The more spare CPU cycles you have the more data you can process on a single node and the more queries that node can handle. What’s more, the CPU cycles are also needed for garbage collection done by the JVM.
Top 10 Tools to Monitor Apache Cassandra
Now that we know what are the metrics that we should be measuring and on which we should set up the alerts, let’s look at some of the best Cassandra monitoring solutions that provide such features.
Sematext Cloud and its on-premise version – Sematext Enterprise – is a full-stack monitoring solution that is easy to set up and that gives you in-depth visibility into your IT infrastructure. Apache Cassandra monitoring makes no exception. Sematext platform provides full visibility into your Cassandra metrics, JVM metrics, and Operating System metrics. It doesn’t matter what platform you are running on – whether it is bare metal, virtual machine, or orchestrated environment. Combining that with powerful alerting with anomaly detection supporting a large number of notification destinations gives a powerful observability platform that will provide you a great experience for your Apache Cassandra clusters monitoring.
- The auto-discovery of services enables hands-off auto-monitoring for your Apache Cassandra clusters.
- Comprehensive support for microservices and containerized environments – support for Kubernetes, Docker, and Docker Swarm with the ability to observe Apache Cassandra running in them, too; a collection of their metrics, logs, and events.
- Network, database, processes, and inventory monitoring in addition to Apache Cassandra metrics providing full visibility into the environment.
- Alerting with anomaly detection and support for external notification services like PagerDuty, OpsGenie, VictorOps, WebHooks, etc.
- Powerful dashboarding capabilities for graphing virtually any data shipped to Sematext.
- Scheduled reporting.
- Full-blown log management solution with filtering, full-text search, alerting, scheduled reporting, AWS S3, Google Cloud, Azure IBM Cloud, Digital Ocean,and MinIO archiving integrations, Elasticsearch-compatible API, and Syslog support giving you the possibility of shipping not only Apache Cassandra metrics, but logs as well.
To start monitoring the only thing you need to do is install the agent in your environment by following the on-screen instructions. No matter if it is bare-metal, virtual machine, or container. You end up getting metrics for Apache Cassandra and the hardware it is running on – for example, the container that Apache Cassandra is running with. After that, you end up with out-of-the-box reports for each crucial group of metrics. Quick and easy – more or less a no-brainer.
The pricing depends on the number of monitored hosts or containers. Metrics monitoring is flexible and metered by the hour which makes it suitable for dynamic environments that scale up and down. You only pay for what you’ve used. Logs pricing depends on the volume and retention with a per-GB volume discount as your log volume or data retention goes up. A limited free plan is available for both metrics and logs with paid plans starting at $19/month for metrics and $50/month for logs.
2. New Relic Cassandra Monitoring
New Relic as an end-to-end observability solution providing Apache Cassandra integration is available in a software-as-a-service model. Its monitoring capabilities include crucial database, JVM, operating system metrics along with the inventory data for full visibility into the cluster, and the execution environment. Combined with alerts and New Relic Query Language the platform provides everything that is needed to have everything Apache Cassandra related under control.
- Application Performance Monitoring with dashboarding and full Apache Cassandra support.
- Log centralization and analysis allowing you to send not only Apache Cassandra metrics, but logs as well.
- Integrated alerting with anomaly detection.
- Rich and powerful query language – NRQL.
- Integration with most known cloud providers such as AWS, Azure, and Google Cloud Platform makes it easy to monitor Apache Cassandra on common platforms.
The installation is quite simple and requires you to follow the instructions depending on the environment in Apache Cassandra is working. If you are running an orchestrated environment you will end up modifying a few YAML files, but it is not a very complicated step. If New Relic becomes your platform of choice just follow the documentation carefully and you shouldn’t have any issues.
User and data based pricing available (pricing). The user type (Standard, Pro, Enterprise) determines the available features and the amount of data you can send without additional fee. For example, the Standard plan includes up to 5 users with full access and unlimited number of basic users without the access to features such as APM, infrastructure monitoring, Synthetics, Serverless monitoring, and more.
3. Datadog Apache Cassandra Monitor
Datadog is a full-stack monitoring tool for your Apache Cassandra clusters as well as an extended set of features needed to monitor your infrastructure, applications, containers, network, and logs. After adjusting the agent configuration you will be able to use the relevant database, JVM, and operating system metrics for graphing and correlations.
- Apache Cassandra Application performance monitoring with a large number of additional integrations available and distributed tracing support.
- Logs centralization and analysis allowing you to ship Cassandra logs and correlate them with metrics.
- Network and host monitoring.
- Alerting with machine learning capabilities.
- Collaboration tools for team-based discussions simplifying the handling of multiple large clusters.
- API allowing to work with the data, tags, and dashboards.
All that is needed is installing the DataDog agent. If you already have the agent installed you are good to go, if you don’t just download the agent package and follow the instructions. The only thing to keep in mind is that the default limit is 350 metrics per monitored instance. So you may end up being forced to contact DataDog support when you need to collect more metrics or limit the JMX metrics collection in the DataDog JMX agent configuration. You can also automatically gather Apache Cassandra logs if you want to have full visibility into the distributed database.
Feature, host, and volume-based pricing combined – each part of the solution is priced differently that can be billed annually or on-demand. The on-demand billing makes the solution about 17 – 20% more expensive than the annual pricing at the time of this writing.
4. ManageEngine Applications Manager
ManageEngine Applications Manager is a single, integrated application performance monitoring for infrastructure and applications including Apache Cassandra. It supports various infrastructure architecture like bare metal, virtual machines, and containers and includes a wide variety of application integrations. All of that combined with actionable alarms and powerful reporting gives a proper observability solution for your Apache Cassandra and the whole infrastructure.
- Apache Cassandra application performance monitoring with a large number of additional integrations available.
- Alerting engine with notifications support.
- Docker and Kubernetes integrations making it easy to monitor Cassandra running in different environments.
- Support for Apache Cassandra metrics when running on common cloud providers like Amazon Web Services, Microsoft Azure, Google Cloud Platform, and OpenStack.
Adding monitoring of Apache Cassandra requires creating a monitoring link with Cassandra type and providing the location on which the database is running and the JMX port which exposes the metrics. We can choose to monitor a single instance of our database or by default discover all the nodes forming the cluster and tell the platform to which monitoring group the data should be assigned to. Once that is done the metrics should be present in the dedicated section of the platform providing the metrics. You can learn which metrics are monitored by looking at the official documentation.
ManageEngine Applications Manager comes in two versions – Professional and Enterprise and the pricing depends on the selected version, the number of monitors, and the number of users that will be using the product with the free version available.
5. AppDynamics APM
AppDynamics is focused on large enterprises. Available in both software as a service and an on-premise model AppDynamics provides the ability to connect Apache Cassandra application performance metrics with infrastructure data, alerting, and business-level metrics. A combination of these allows you to monitor not only Apache Cassandra, but the whole stack that runs your services. It gives you insights into your environment – from top-level transactions that are understood by the business executives to the code-level information useful for DevOps and developers.
- Apache Cassandra monitoring with visibility from top-level transactions to the backend calls.
- Infrastructure monitoring with network components, databases, and servers visibility providing information about status, utilization, and flow between each element.
- Machine Learning supported anomaly detection and root cause analysis features.
- Alerting with email templating and period digest capabilities.
To enable Apache Cassandra in your already installed AppDynamics Machine Agent or the Server Infrastructure & Monitoring you will need an extension. It requires a few manual steps but is pretty well documented in AppDynamics documentation. After all is done, you get a nice view over your Apache Cassandra metrics along with all other already monitored metrics. You can also follow the in UI steps with easy installation and configuration instructions. One thing to keep in mind, though not a blocker – the UI didn’t load for me correctly when using Mozilla Firefox, though Chrome worked without any issues.
Agent and feature-based pricing are used which makes pricing really not transparent. The amount of money you will pay for the solution depends on what features you need and want to use from the platform. For example, visibility into the CPU, memory, and disk metrics which are crucial for Apache Cassandra requires the APM Advanced plan.
Dynatrace is a full-stack observability solution with support for Apache Cassandra. It provides a user-friendly approach to monitoring your database, infrastructure, and logs. Available in both software as a service and on-premise models it will fulfill most of your monitoring needs when it comes to Apache Cassandra performance monitoring, its logs, and the infrastructure it is running on.
- Apache Cassandra performance monitoring with dashboarding and additional integrations for commonly used tools and code-level tracing.
- First-class Log analysis support with automatic detection of the common system and application log types.
- Diagnostic tools allow taking memory dumps, exceptions, and CPU analysis helpful in Apache Cassandra troubleshooting.
- Docker, Kubernetes, and OpenShift integrations making it easy to monitor Cassandra running in different environments.
- Support for Apache Cassandra metrics when running on common cloud providers like Amazon Web Services, Microsoft Azure, and Google Cloud Platform.
Once you have the Dynatrace OneAgent installed on a host or multiple hosts, enabling monitoring of Apache Cassandra JMX metrics is as simple as turning on one of the switches in the appropriate settings section. After that, all your Apache Cassandra clusters will be monitored and you’ll be able to see all the relevant metrics in the dedicated technology
Pricing is organized around features. The application performance monitoring pricing is tied to hosts and the amount of memory available on a host. Each 16GB is a host unit and the price is calculated based on the number of host units in an hour. The logs part of the solution is calculated based on the volume.
7. SolarWinds Server & Application Monitor
SolarWinds Server & Application Monitor provides monitoring of Apache Cassandra clusters for those of us who would like to run an on-premise monitoring solution on Microsoft Windows. It provides visibility into Apache Cassandra metrics along with Windows and Linux metrics depending on the environment of your choice. With alerting supported out-of-the-box and dashboarding capabilities the solution is a good candidate for monitoring not only Apache Cassandra but the whole environment.
- Microsoft Windows-based platform for your environment including Apache Cassandra
- Out of the box alerting for keeping an eye on your Cassandra metrics
- Easy integration with Microsoft Windows-based services, like Active Directory or IIS
Subscription and perpetual licensing possible starting at $1,275 with a 30-days free trial available.
8. Prometheus & Grafana
Powerful open-source combo – Prometheus and Grafana combine great flexibility with backend providing monitoring for metrics that are not only dedicated to Apache Cassandra. Powerful query language allows retrieving the data and powering the built-in alerting functionality. With a plethora of integration configurations available for both Prometheus and Grafana, it is easy to build an observability platform for the whole environment. However, keep in mind that an initial setup and configuration step – as well as ongoing upgrades, maintenance, etc. – will be required and may be time-consuming and thus costly, especially if you are not familiar with the tools.
- Datastore implementing a highly dimensional data store allowing you to gather every Apache Cassandra metric that you can think of.
- A large number of available integrations maintained by the creators and community helping in setting up monitoring for Apache Cassandra and more.
- Out of the box alerting based on powerful Prometheus query language to alert on virtually any metrics related to Apache Cassandra.
- Efficient time-series storage.
- Great visualizations flexibility allowing you to graph data that is of interest.
Installing Prometheus and Grafana was straight forward with downloading the Prometheus and Grafana packages followed by straight forward configuration. The only thing that was needed in addition to that was the exporter configuration and Java Agent jar, but those are publicly available. Once that is done you can use the pre-created Grafana dashboards, so you can go straight to the metrics and explore them in an organized fashion.
Free, but requires you to pay the price of maintenance and storage for your metrics.
Zabbix is an open-sourced tool capable of real-time monitoring of Apache Cassandra with a proper extension template. It’s well known and will not only give you your database monitoring, but is also suited as a monitoring solution from large scale enterprises and small companies. If you are looking for a solution with a large community, well supported, and free of charge you should look at Zabbix. Its multi-system agents allow you to gather key performance indicators across your environment and create alerts on them. With the template-based setup and auto-discovery you can speed up even the largest setups.
- Problem detection and prediction mechanism with flexible thresholds and severity levels defining their importance helping
- Multi-lingual, multi-tenant, flexible UI with dashboarding capabilities and geolocation support for large organizations with data centers spread around the world perfectly suitable for multiple Apache Cassandra clusters spread around multiple datacenters.
- Support for adjustable notifications with out of the box support for email, SMS, Slack, Hipchat and XMPP and escalation workflow.
- Template-based host management and auto-discovery for monitoring large environments and suitable for dynamically changing Apache Cassandra environments.
When installing and configuring Zabbix I was reminded of the good old times of installing and running Apache with PHP and MySQL. I had to take care of each piece of required software and include additional templates to fetch metrics for Cassandra. Keep that in mind when choosing Zabbix as the solution.
Zabbix is open-sourced and free. You can subscribe for support, consultancy, and training around it though if you would like to quickly and efficiently extend your knowledge about the platform.
10. Graphite & Grafana
Graphite and Grafana is another open-source monitoring combo combining providing great flexibility with the backend for measuring metrics from Apache Cassandra and more. You can use out-of-the-box Graphite UI, but if you prefer Grafana may be the UI of choice and give you even more possibilities and graphing options. Similar to Prometheus and Grafana, it is easy to build an observability platform for the whole environment with multiple solutions supporting Graphite as the metrics storage. Keep in mind that an initial setup and configuration step will be required and may be time-consuming.
- Efficient time-based datastore allowing you to gather every Apache Cassandra metric that you can think of.
- A large number of available integrations helping in setting up monitoring for Apache Cassandra and more.
- Alerting based on Grafana functionality.
- Great visualizations flexibility allowing you to graph data that is of interest.
Similar to Zabbix, Graphite required a bit of work when it came to installation and configuration, but once I got it to work everything I needed was there. The additional plus is that Grafana supports Graphite as the source for the metrics, so not only I got a very efficient metrics storage, but also modern and flexible software for building out dashboards that I needed.
Free, but requires you to pay the price of maintenance and storage for your metrics.
Get Started with Cassandra Monitoring
Apache Cassandra isn’t a simple system. It is a complicated distributed database capable of handling large quantities of data without having a single point of failure and being fault-tolerant. It is designed to spread data across multiple data centers spread across different geographical locations. Such software requires special treatment and you need a good observability solution to be able to monitor its activity.
You want to be sure you have each crucial metric under control. Your platform needs to provide out-of-the-box alerting working in real-time so that you are notified about issues as soon as they start happening before your users notify the degradation of the service. Finally, choose a tool that will provide you monitoring not only for your Apache Cassandra but your whole infrastructure and software stack. Not only when it comes to metrics, but logs, real user monitoring, and so on. You want to be able to correlate everything in one place, so you should keep that in mind when choosing the best Cassandra monitoring tool for your organization.