Redis is an open-sourced, BSD 3 licensed, highly efficient in-memory data store. It is used widely in the industry because of its incredible performance and ease of use. It can easily be used as a distributed, in-memory key-value store, cache, or message broker. It can hold virtually any data structure, making it highly versatile.
Redis was architected and developed with speed in mind and designed to keep all the data in memory. Because of that, it is crucial to have visibility into the memory-related metrics to ensure that Redis has enough memory to work and operate with the best performance. You also need to pay attention to all the other metrics apart from memory to get the best out of Redis.
In this article we will look into the key Redis metrics that you should measure to ensure the best performance of this distributed, in-memory data store.
Key Redis Metrics to Monitor
No matter how well the application is designed and implemented, you will hit its limits sooner or later. You need to ensure that you will know about being close to the limits before your applications or users start noticing performance degradation or even failures. To do that, you need to understand the key metrics to monitor and how they affect your Redis instances and clusters.
Memory is one of the most important Redis metrics to keep an eye on. Redis uses memory to store data and process requests. To achieve the best performance, you should make sure you have enough memory to keep all data. Otherwise, the memory will be swapped, meaning that Redis will write some of the data in the memory to the swap space. This is a major performance bottleneck because even the fastest disks are magnitude times slower than memory. And is one of the reasons why swapping will greatly affect your Redis performance badly.
When it comes to memory, monitoring the single, average memory utilization is not enough to understand the state of your Redis. You should divide the memory metrics and keep an eye on the following:
- Used memory – the memory allocated to Redis by the functionality responsible for managing Redis memory. Having enough memory will prevent situations with out of memory errors, so pay special attention to this metric.
- Peak used memory – the maximum amount of memory consumed by Redis, showing you the maximum memory your Redis instance will require to operate.
- RSS (Resident Set Size) refers to the memory allocated to Redis by the operating system.
- Memory fragmentation – the ratio of used memory to the RSS memory, indicating fragmentation. Any ratio above 1.5 will signal excessive memory fragmentation, making it harder and harder to find a continuous space inside the memory where the data can be created. You can get around it by restarting your Redis instance. The ideal scenario is having the memory fragmentation equal to 1 or slightly higher.
Each request processed by Redis will require CPU cycles to be processed and completed. That’s why you need to monitor the CPU’s overall utilization of your Redis instances.
The more spare CPU cycles you have in a given environment, the more requests your Redis server can handle. The CPU usage can be expressed by a single number that shows the average in a given time period, but you can also divide it into certain areas like
- user – the percentage of total CPU processing available for user-based execution, like applications,
- system – the percentage of total CPU processing power available spent on the operating system related execution,
- wait – the percentage of time spent waiting for resources, like disk or network,
- and more.
The user part of the CPU usage will show what your Redis process needs. You should avoid situations where the CPU is constantly at 100% utilization, as the server is overloaded and will affect your request processing time.
One of the metrics illustrating the load on the Redis instances is the number of connections handled by Redis at a given time. There are three types of connections you should monitor to ensure the performance of your Redis database:
- Active connections – the number of connections in use by the clients that are currently connected to the Redis instance. You may want the number of allowed connections equal to the peak of the active connections.
- Blocked connections – the number of connections waiting for Redis to finish some kind of internal blocking operation. The increasing number of blocked connections may indicate long-running operations and clients waiting for the results.
- Rejected connections – the connections your client using Redis tried to establish but was unable to because the database has reached the limit on the number of active connections it can handle, which, by default, is 10,000. The number of rejected connections is one metric that points to your Redis instance being overloaded or not well configured.
Operations is an important monitoring metric that shows the number of operations performed by Redis in a given time – for example, per second. The higher the number, the more resources Redis needs to execute the operations in a timely manner. A low number of operations per second may mean that you’re either not using Redis much or that your Redis is overloaded and struggling to execute those operations.
A keyspace in Redis is an internal dictionary of keys that Redis uses and manages to store all its keys. If you are using a single Redis instance, then it will hold the whole keyspace on that single node. If you are using a cluster of Redis instances, your keyspace will be divided between multiple Redis instances.
There are two additional concepts that you need to know and keep in mind when it comes to critical Redis monitoring metrics. The hits and misses. When using Redis as a general-purpose cache, you can think about that as your cache hit ratio. Basically, the hit ratio is the number of hits divided by the number of requests to Redis in general. The cache miss ratio is the number of misses divided by the number of requests to Redis.
When your application retrieves the data from Redis, it uses a key to reference that data. The data itself may already be available in Redis and returned immediately, which means that the request will be a hit. On the other hand, if the data is not present, the request will be a miss, and your application will have to retrieve the data from another data store – the source of truth for the data.
Knowing the metrics related to hits and misses is why keyspace metrics are extremely useful when using Redis as a general-purpose cache. They give you information about the percentage of requests that actually utilize the information from Redis. This will help you understand whether you’re leveraging Redis’ speed in the memory store or not. If your miss ratio is high, it means that you’re not. As a general rule, you want to maximize the hit percentage for your keys, though it is highly dependent on the use case.
Each Redis entry has a property called Time To Live (TTL). This value tells Redis when to remove the data from its memory. It is up to the application using Redis to define the validity of the given key and its associated data. If the application does not define this value, it causes expired data to pile up in Redis’ memory.
The more evictions or expirations you see, the higher the chance that the next request may not find a given key in the keyspace. Your application will need to fetch the requested data from the original data source, which can increase latency. However, keeping the data for a longer period of time may result in it not being up to date if the application doesn’t care about proper data refresh. You should keep these in mind when designing your application.
Another key Redis metric you should monitor is latency – the average time the Redis server needs to respond to your application requests. Of course, the lower, the better. Keeping an eye on request latency is the most direct and easy way to track changes in Redis performance over time.
Keep in mind that Redis is single-threaded, handling requests one after another. If a single request is slower, all the consecutive requests will be slower, affecting the performance of the whole operation.
When using replication in Redis, you may need to enable Redis persistence, which means you will need to pay attention to metrics related to it – the last save time and the number of changes since the last save time. These metrics are usually irrelevant when using Redis as a cache, but they may be crucial for cases where data is not volatile. Long times between writes and a large number of changes between writes to disk may expose your system to data loss. These persistence metrics give you an idea of how much data may be lost when something tragic happens between the writes to persistence.
Monitoring Redis Metrics with Sematext
Sematext Monitoring is a full-stack observability platform providing a view into all the necessary metrics for efficient Redis monitoring. Pre-built and ready-to-use dashboards enable instant tracking with many options to customize them to your needs. The exposed Redis metrics combined with the operating system metrics provide the necessary metrics for complete visibility into your environment. The threshold and anomaly-based alerts provide passive monitoring, ensuring you’ll never miss any issues in your infrastructure.
To learn more about what Sematext Monitoring can do for you check out the video below or simply start the 14-day free trial to test it out yourself:
In this article we’ve looked at the most important Redis performance metrics you should monitor over extended periods of time to ensure that your Redis instances and clusters are healthy and running at peak levels. A good monitoring solution will help you stay up to date with these metrics, how they change, and the trends and anomalies associated with them. To help you get started, we’ve rounded up the best ones in our Redis monitoring tools blog post, having reviewed both open-source and commercial solutions such as Sematext Redis Monitoring.