Come join our live training webinar every other Wednesday at 11am PST and hear LogicMonitor experts explain best practices and answer common questions. We understand these are uncertain times, and we are here to help!
FEATURE AVAILABILITY: The dynamic thresholds feature is available to users of LogicMonitor Enterprise.
Dynamic thresholds represent the bounds of an expected data range for a particular datapoint. Unlike static datapoint thresholds which are assigned manually, dynamic thresholds are calculated by anomaly detection algorithms and continuously trained by a datapoint’s recent historical values.
When dynamic thresholds are enabled for a datapoint, alerts are dynamically generated when these thresholds are exceeded. In other words, alerts are generated when anomalous values are detected.
Dynamic thresholds detect the following types of data patterns:
Because dynamic thresholds (and their resulting alerts) are automatically and algorithmically determined based on the history of a datapoint, they are well suited for datapoints where static thresholds are hard to identify (such as when monitoring number of connections, latency, and so on) or where acceptable datapoint values aren’t necessarily uniform across an environment.
For example, consider an organization that has optimized its infrastructure so that some of its servers are intentionally highly utilized at 90% CPU. This utilization rate runs afoul of LogicMonitor’s default static CPU thresholds which typically consider ~80% CPU (or greater) to be an alert condition. The organization could take the time to customize the static thresholds in place for its highly-utilized servers to avoid unwanted alert noise or, alternately, it could globally enable dynamic thresholds for the CPU metric. With dynamic thresholds enabled, alerting occurs only when anomalous values are detected, allowing differing consumption patterns to coexist across servers.
For situations like this one, in which it is more meaningful to determine if a returned metric is anomalous, dynamic thresholds have tremendous value. Not only will they trigger more accurate alerts, but in many cases issues are caught sooner. In addition, administrative effort is reduced considerably because dynamic thresholds require neither manual upfront configuration nor ongoing tuning.
Dynamic thresholds require a minimum of 5 hours of training data for DataSources with polling intervals of 15 minutes or less. As more data is collected, the algorithm is continuously refined, using up to 15 days of recent historical data to inform its expected data range calculations.
Daily and weekly trends also factor into dynamic threshold calculations. For example, a load balancer with high traffic volumes Monday through Friday, but significantly decreased volumes on Saturdays and Sundays, will have expected data ranges that adjust accordingly between the workweek and weekends. Similarly, dynamic thresholds would also take into account high volumes of traffic in the morning as compared to the evening. A minimum of 2.5 days of training data is required to detect daily trends and a minimum of 9 days of data is required to detect weekly trends.
As discussed in the previous section, dynamic thresholds require a minimum of 5 hours of data for operation. This means that this feature must be enabled for 5 hours and longer if polling intervals are less frequent than every 15 minutes before becoming operational. During this startup time period, alerts will continue to be routed as normal based on static datapoint thresholds (assuming static thresholds are in place).
Similar to static datapoint thresholds, there are multiple levels at which dynamic thresholds can be enabled:
Dynamic thresholds cascade down from the global DataSource level. However, if other dynamic threshold configurations are encountered at deeper levels in the Resources tree, those deeper configurations will override those found at higher levels. For example, dynamic thresholds set at the resource group level will override those set at the global DataSource level. Similarly, dynamic thresholds set at the instance level will override those set at the resource group level.
The following table illustrates which set of threshold configurations will be used when evaluating a datapoint. When interpreting the table, assume the following conditions to be true:
Note: The number of dynamic thresholds allowed per portal is limited to eight per total number of permitted monitored resources. This limit is an aggregate limit, enforced at the account level, not the per-resource level. For example, if your account permits monitoring of up to 100 total resources (this includes all monitored devices, cloud and Kubernetes resources, and services), then 800 total dynamic thresholds are permitted across your portal, applied in any manner you see fit. This total represents the number of times dynamic thresholds are potentially evaluated, meaning that if a dynamic threshold, even if only configured once, is inherited across multiple instances, each instance contributes to the total. For visibility into current usage and limit, navigate to Settings | Account Information.
As a general rule, global enabling of dynamic thresholds is recommended when a majority of the instances in your infrastructure will benefit. Global-level enablement for dynamic thresholds takes place in the DataSource definition.
To enable dynamic thresholds for a datapoint at the global level:
Note: The DataSource editor available from the Exchange page does not currently support dynamic threshold configuration.
Note: You may also see static thresholds in place for the same datapoint. Static thresholds and dynamic thresholds can be used in conjunction with one another, as discussed in Assigning Both Static and Dynamic Thresholds to a Datapoint.
By default, LogicMonitor attempts to auto-determine the right range sensitivity for each severity, but you are able to make adjustments to the following advanced configurations:
Enabling dynamic thresholds at the instance or resource group level takes place on the Resources page. As highlighted next, there are different entry points, depending upon whether you are enabling dynamic thresholds for a single instance on a single resource, multiple instances on a single resource, or all instances in a resource group.
To enable dynamic thresholds for a datapoint at the resource group or instance level:
Note: Anomaly detection graphs are based on aggregated (resampled) data, so their renderings of graph coordinates and expected range bands may differ slightly from the expected range used to generate the alert. This visual discrepancy potentially increases as the graph’s time range increases because more resampling is required to fit the graph onto a finite screen size. For this reason, we recommend setting shorter time ranges when possible to minimize the amount of resampling required. For more information on anomaly detection graphs, see Anomaly Detection Visualization.
Dynamic thresholds (and their resulting alerts) are automatically and algorithmically determined based on the history of a datapoint. There is no requirement to additionally establish static thresholds for a datapoint for which dynamic thresholds are enabled.
However, the assignment of both static and dynamic thresholds for a single datapoint is possible—and desirable in the right use case. When both types of thresholds are set for a datapoint, alerting behavior becomes more flexible, supporting either—or both—of the following behaviors per datapoint:
As a result of these two sets of behaviors (which can work independently or in cooperation with one another), dynamic thresholds can be used to:
A good strategy for optimizing both sets of behaviors (trigger and suppression) is to enable dynamic thresholds for warning and/or error severity level alerts for a datapoint, while enabling static thresholds only for critical severity level alerts. This ensures that if static thresholds aren’t tuned well for less-than-critical alerts, alert noise is reduced and dynamic thresholds catch issues that aren’t caught by static thresholds. And for values that represent critical conditions, the static threshold will, without question, result in an alert.
This is useful when there are different use cases for suppression and alert generation. For example, a percentage-based metric for which there is a very clear good and bad range may benefit more from suppression when it is not outside its expected range than from alert generation when it is deemed anomalous.
Note: When static and dynamic thresholds are both enabled for the same datapoint and alerts are triggered by both, the highest severity alert always takes precedence. If these alerts are triggered for the same severity, the alert triggered by the static threshold always takes precedence.
For more information on static thresholds, see Tuning Static Thresholds for Datapoints.
Alerts that have been generated by dynamic thresholds (or alerts whose notification deliveries have been suppressed by dynamic thresholds) display as usual in the LogicMonitor interface.
Alerts generated by dynamic thresholds will provide deviation details (expected range and deviation from expected range) in the alert description and graph, both found in the Overview tab.
When viewing alerts from the Alerts page or from the Alerts tab on the Resources page (or when configuring the Alert List widget or Alerts report), you can use the Anomaly filter to limit alert display to only those alerts that were triggered by dynamic thresholds. For more information on alert filters, see Managing Alerts from the Alerts Page.
Note: By default, the alert table does not display alerts that have been cleared. To see a historical account of all alerts triggered by dynamic thresholds, enable the cleared filter in conjunction with the anomaly filter.
The Alerts Thresholds report provides visibility into the datapoint thresholds set across your LogicMonitor platform. It reports on the thresholds in effect across multiple resources, including detailing thresholds that have been overridden and highlighting resources for which alerting has been disabled. To learn more about this report, see Alert Thresholds Report.
Note: If you’d like to see the resources/instances across your portal for which custom dynamic thresholds have been set, run this report with the the Only show custom thresholds option checked.
Next are some best practices to keep in mind when enabling dynamic thresholds:
In This Article