Redis compression strategies: How we achieved 85% data reduction

Redis (Remote Dictionary Server) is a remarkably fast NoSQL key-value data structure server developed by Redis and written in the ANSI C programming language. Redis performs in-memory caching, meaning data structure stores can be saved and recalled much more nimbly than with a traditional database. In addition to its speed, Redis compression can also easily support a wide range of abstract data types, making it incredibly flexible for almost any situation.
At LogicMonitor, we focus on large volumes of runtime series data, monitoring customer devices at regular intervals for our agentless application. Recently, we’ve integrated machine learning to improve anomaly detection by developing a stateless microservice that interprets data streams and provides meaningful analytics. While essential, this process generates significant data that must be stored and accessed efficiently, which is where Redis comes in.
While Redis excels in speed and flexibility, it does not natively compress the data it stores. By default, Redis stores data in its raw format (uncompressed), which is highly efficient for snappy retrieval and processing but can lead to increased memory usage, especially when dealing with large volumes of data, such as time series or large datasets in JSON format.
Given Redis’s lack of built-in compression, manual compression techniques become essential for optimizing memory usage and reducing storage footprint. These techniques allow you to compress data before storing it in Redis, thereby significantly reducing the amount of memory consumed and improving the efficiency of data storage. This is particularly critical in environments like ours, where high data throughput and storage efficiency are paramount.
By applying manual compression strategies, such as those discussed later in this article, you can leverage the speed of Redis while mitigating the memory consumption typically associated with large-scale data storage.
To persist machine learning training data for our various algorithms across several pods in our environment, we maintain a model in JSON format. This model is cached in centralized Redis clusters for high availability. Below is a simplified example of this model.
{
"resource-id-1 ": {
"model-version" : string,
"config-version": int,
"last-timestamp": int,
"algorithm-1-training-data": {
"version" : string,
"training-data-value": List[float],
"training-data-timestamp": List[float],
"Parameter-1" : float,
"Parameter-2" : float
},
"algorithm-2-training-data": {
"version" : string,
"training-data-value": List[float],
"training-data-timestamp": List[float],
"Parameter-1" : float,
"Parameter-2" : float
},
......
}
}
By default, Redis does not compress values it stores (read more here), so any size-reduction would have to be performed on the application side first. We found that the storage and network throughput savings achieved through any of our manual compression options were worth the additional processing time.
At first, we elected to utilize LZ4 because of its speed and ease of integration with the existing codebase. The introduction of lossless compression immediately reduced the overall storage size from approximately 39,739 bytes to a much more manageable 16,208 bytes (60% savings). Most of these savings can be attributed to the statistical redundancy in the key-values of our model.
As our project grew in complexity, we needed to include additional longer float value lists. Because of this, our training data model more than doubled in size and now mostly consisted of seemingly random numerical data that lessened its statistical redundancy and minimized compression savings. Each of these float values had large decimal precision (due to the double-precision default in Python), which translated to considerably more bytes when setting to Redis as a string. Our second strategy was to round decimal precision with two different options. This is detailed below.
Option 1 Result: ~80,000 bytes reduced to ~36,000 bytes (55% savings)
Option 2 Result: ~80,000 bytes reduced to ~35,000 bytes (56% savings)
Option 1 was selected as it saved approximately the same level of memory without the added complexity and reduced decimal precision. We also verified that losing the precision impact on the algorithm was insignificant.
While rounding dramatically reduced our Redis storage, consumption was still high and we lost a margin of accuracy in our algorithm’s calculations.
Before converting to Lists, the float value arrays exist as Numpy N-dimensional arrays (ndarray). Each of these arrays was scaled down to float16 values and encoded in Blosc with the internal ZLib codec. Blosc can use multithreading and splits the data to compress in smaller blocks, making it much faster than traditional compression methods.
The resulting compressed array is then encoded with Base64 and decoded to 8-bit Unicode Transformation Format (UTF-8) before being dumped to JSON. The resulting Redis storage size was reduced down to 11,383 bytes (from ~80,000 bytes), a dramatic improvement from what was achieved by either compressing exclusively with LZ4 or compression with float decimal rounding.
Ultimately, compression with encoding was the strategy included in the final iteration of our anomaly detection program. Combining the two strategies saved 85% of memory consumption.
During compression testing, we utilized some of our out-of-the-box Redis modules to monitor performance for all Get and Set transactions. Comparing transmission results to Redis with and without prior compression demonstrated substantial elapsed time improvements.
While compression offers substantial benefits in reducing Redis’s storage footprint, it is not without its challenges. Implementing compression in Redis can introduce potential pitfalls that, if not managed carefully, could impact performance and data integrity. Below, we outline some of these challenges and provide strategies to mitigate them.
Compression, by its nature, requires additional CPU resources to encode and decode data. During periods of peak load, this can lead to increased latency and reduced throughput, particularly if the compression algorithm used is resource-intensive.
How to avoid it:
Compressed data, especially when using more complex encoding methods, can be more susceptible to corruption, especially if decompression also happening. If a single bit in a compressed block is altered, it could potentially render the entire block unreadable, leading to data loss.
How to avoid it:
As your Redis database grows, managing compressed data can become increasingly complex. The added layer of compression requires careful scaling strategies to ensure that data retrieval times remain consistent and that the system remains responsive.
How to avoid it:
LogicMonitor is the only fully automated, cloud-based infrastructure monitoring platform for enterprise IT and managed service providers. Gain full-stack visibility for networks, cloud, servers, and more within one unified view.
Blogs
See only what you need, right when you need it. Immediate actionable alerts with our dynamic topology and out-of-the-box AIOps capabilities.