Redis Compression Benchmarking

Redis Compression Benchmarking LogicMonitor

At LogicMonitor, we deal primarily with large quantities of time series data. Customer devices are monitored at regular intervals and data points are provided to our agentless application to be processed and interpreted. Recently, we’ve endeavored to expand the presence of machine learning in our application to enhance anomaly detection. This involved developing a stateless microservice that could quickly interpret the continuous stream of data point values and return meaningful anomaly analytics, detect potential seasonality, and suppress alert noise.  

The intention of this blog, however, isn’t to discuss the anomaly detection microservice, but rather explore our team’s reasoning for including Redis as a central fixture. Moreover, it looks to describe the various compression strategies employed to reduce our Redis storage footprint and the advantages and shortcomings of each.

What Is Redis?

Redis (Remote Dictionary Server) is a remarkably fast NoSQL key-value data structure server developed by Redis and written in the ANSI C programming language.  Redis performs in-memory caching, meaning data structure stores can be saved and recalled much more nimbly than with a traditional database.  In addition to its speed, Redis can also easily support a wide range of abstract data types, making it incredibly flexible for almost any situation.  

Example State Model

To persist machine learning training data for our various algorithms across several pods in our environment, we maintain a model in JSON format.  This model is cached in centralized Redis clusters for high availability.  Below is a simplified example of this model.

{
  “resource-id-1 “: {
    “model-version” : string,
    “config-version”: int,
    “last-timestamp”: int,
    “algorithm-1-training-data”: {
      “version” : string,
      “training-data-value”: List[float],
      “training-data-timestamp”: List[float],
      “Parameter-1” : float,
      “Parameter-2” : float
    },
    “algorithm-2-training-data”: {
      “version” : string,
      “training-data-value”: List[float],
      “training-data-timestamp”: List[float],
      “Parameter-1” : float,
      “Parameter-2” : float
    },
    ……
  }
}

Compression Strategies

LZ4 Compression

By default, Redis does not compress values it stores (read more here), so any size-reduction would have to be performed on the application side first.  We found that the storage and network throughput savings achieved through any of our manual compression options were worth the additional processing time.

At first, we elected to utilize LZ4 because of its speed and ease of integration with the existing codebase. The introduction of lossless compression immediately reduced the overall storage size from approximately 39,739 bytes to a much more manageable 16,208 bytes (60% savings). Most of these savings can be attributed to the statistical redundancy in the key-values of our model.

Reducing Noise Before Compression

As our project grew in complexity, we needed to include additional longer float value lists. Because of this, our training data model more than doubled in size and now mostly consisted of seemingly random numerical data that lessened its statistical redundancy and minimized compression savings.  Each of these float values had large decimal precision (due to the double-precision default in Python), which translated to considerably more bytes when setting to Redis as a string.  Our second strategy was to round decimal precision with two different options. This is detailed below.

Rounding Option 1 (All Values)

  • Round all float values to 4 decimals of precision.

Rounding Option 2 (Based on 5th Quantile Value)

  • Round float values to 0 decimals if value is above 100, (e.g. 54,012.43 → 54,012)
  • Round float values to 1 decimals if value is between 1-99,   (e.g. 12.43 → 12.4)
  • Round float values to 4 decimals if value is less than 1  (e.g. 0.4312444 → 0.4312) 

Option 1 Result:  ~80,000 bytes reduced to ~36,000 bytes (55% savings)

Option 2 Result:  ~80,000 bytes reduced to ~35,000 bytes (56% savings)

Option 1 was selected as it saved approximately the same level of memory without the added complexity and reduced decimal precision. We also verified that losing the precision impact on the algorithm was insignificant.

Compression With Encoding – 85% Savings!

While rounding dramatically reduced our Redis storage, consumption was still high and we lost a margin of accuracy in our algorithm’s calculations.  

Before converting to Lists, the float value arrays exist as Numpy N-dimensional arrays (ndarray).  Each of these arrays was scaled down to float16 values and encoded in Blosc with the internal ZLib codec.  Blosc can use multithreading and splits the data to compress in smaller blocks, making it much faster than traditional compression methods.

The resulting compressed array is then encoded with Base64 and decoded to 8-bit Unicode Transformation Format (UTF-8) before being dumped to JSON.  The resulting Redis storage size was reduced down to 11,383 bytes (from ~80,000 bytes), a dramatic improvement from what was achieved by either compressing exclusively with LZ4 or compression with float decimal rounding.  

Ultimately, compression with encoding was the strategy included in the final iteration of our anomaly detection program. Combining the two strategies saved 85% of memory consumption.

Monitoring While Benchmarking

During compression testing, we utilized some of our out-of-the-box Redis modules to monitor performance for all Get and Set transactions. Comparing transmission results to Redis with and without prior compression demonstrated substantial elapsed time improvements.

Viewing Redis Set Elapsed Time over the past 24 hours dashboard in LogicMonitor.

About LogicMonitor

LogicMonitor is the only fully automated, cloud-based infrastructure monitoring platform for enterprise IT and managed service providers. Gain full-stack visibility for networks, cloud, servers, and more within one unified view.