Tuning Alert Thresholds for Datapoints
IN THIS ARTICLE:
Receiving too many meaningless LogicMonitor alert notifications can ultimately lead to people ignoring important alerts. On the other hand, not receiving a key alert could result in service downtime or even an outage. The key to avoiding both of these undesirable situations is to tune datapoint alert thresholds for your unique environment.
LogicMonitor does much of the work for you by setting default alert thresholds based on published documentation and KPIs, industry best practices, research, years of experience, and customer feedback. This means that, for the majority of resources you monitor via LogicMonitor's DataSources, alerts are triggered right out of the box.
However, it is impossible to set datapoint alert thresholds that suit every use case. To ensure that your alerting implementation is sufficient, without being noisy, you can continuously refine the default datapoint alert thresholds for the DataSources you are using.
Note: Other datapoint settings (in addition to alert thresholds) that can impact alert noise include alert trigger intervals (i.e. how many consecutive polling cycles a threshold must be exceeded in order for an alert to trigger), alert clear intervals (i.e. how many consecutive polling cycles datapoint values must remain below threshold before an alert clears), and No Data alert behavior (i.e. should the absence of expected data trigger an alert?). As discussed in Creating a DataSource, these settings are configured from the global DataSource definition.
Determining the Level at Which Datapoint Alert Thresholds Should Be Adjusted
Before you adjust datapoint alert thresholds, it's important to first determine at which level they should be adjusted:
- Global level. Thresholds adjusted at the global level cascade down to every instance (across all resources) to which the DataSource is applied.
- Resource group level. Thresholds adjusted at the resource group level cascade down to all instances for all resources in the resource group (and its subgroups).
- Instance level. Thresholds adjusted at the instance level can be configured to apply to a single instance on a single resource, multiple instances on a single resource, or all instances on a single resource.
For example, if you want the adjusted thresholds to apply to all relevant resources in your network infrastructure (i.e. every single instance to which the DataSource could possibly be applied), you would adjust the datapoint alert thresholds at the global level—in the DataSource definition itself. Global tuning is recommended when a majority of the instances in your infrastructure will benefit from the tuning. Alternately, if the adjusted thresholds are only applicable to a single instance of a single resource, you would adjust the thresholds at the instance level.
Datapoint alert thresholds cascade down from the global level. However, if alternate threshold configurations are encountered at deeper levels in the Resources tree, those deeper configurations will override those found at a higher level. For example, datapoint alert thresholds set at the instance level will override those set at the resource group level, and those set at the group level override those set at the global level.
Tuning at the Global Level
Global-level tuning for datapoint alert thresholds takes place in the DataSource definition. The DataSource definition can be accessed by navigating to Settings | DataSources or by clicking the Edit Global Definition hyperlink that is available when viewing DataSource or instance data from the Resources tree.
From the edit view of the DataSource definition, you're able to view and edit all datapoints associated with the DataSource. The most efficient way to edit datapoint alert thresholds is through the wizard, which is available by clicking the manage icon to the left of a datapoint and, from the dialog that appears, then clicking the Wizard button which is found next to the Alert threshold field. For more information on configuring the alert threshold wizard, see the Using the Alert Threshold Wizard section of this support article.
Tuning at the Resource Group Level
Group-level tuning for datapoint alert thresholds takes place on the Resources page. Navigate to the resource group in the Resources tree and open its Alert Tuning tab. Expand the DataSource to which the datapoint belongs and, from the datapoint list that appears, click the pencil icon for the desired datapoint to open its alert threshold wizard. For more information on configuring the alert threshold wizard, see the Using the Alert Threshold Wizard section of this support article.
Tuning at the Instance Level
Instance-level tuning for datapoint alert thresholds takes place on the Resources page. As discussed next, there are different entry points for instance-level tuning, depending upon whether the owning DataSource is a single- or multi-instance DataSource and whether multiple instances, when present, are organized into instance groups. For more information on the differences between single-instance and multi-instance DataSources, see Device DataSources & Instances Overview.
Tuning a Datapoint Alert Threshold for a Single Instance
To tune the datapoint alert threshold for a single-instance DataSource (and thus a single instance), navigate to the DataSource in the Resources tree. (The resource that the DataSource is nested under matters as you are only updating the instance that pertains to that particular resource.) Open the Alert Tuning tab, find the datapoint whose alert threshold you would like to edit, and click the pencil icon found in the "Effective Threshold" column. This opens the alert threshold wizard which is discussed in detail in the Using the Alert Threshold Wizard section of this support article.
To tune the datapoint alert threshold for a single instance of a multi-instance DataSource, navigate to the instance in the Resources tree, open its Alert Tuning tab, find the datapoint whose alert threshold you would like to edit, and click the pencil icon found in the "Effective Threshold" column. This opens the alert threshold wizard which is discussed in detail in the Using the Alert Threshold Wizard section of this support article.
Tuning a Datapoint Alert Threshold for Multiple Instances at Once
In addition to tuning datapoint alert thresholds for a single instance, you can also tune thresholds for multiple instances at once. This saves time if your end goal is to tune thresholds for all instances (or a subset of instances, called an instance group) found on a resource. For more information on instance groups, see Instance Grouping.
To tune multiple instances at once, use the Resources tree to navigate to either the multi-instance DataSource (assuming you want to tune all instances on the resource at once) or one of its instance groups (assuming you want to tune only a subset of the instances). Open the Alert Tuning tab, find the datapoint whose alert threshold you would like to edit for all instances across the resource or instance group, and click the pencil icon found in the "Effective Threshold" column. This opens the alert threshold wizard which is discussed in detail in the Using the Alert Threshold Wizard section of this support article.
Using the Alert Threshold Wizard
The most thorough method for adding or adjusting datapoint alert thresholds is to use the alert threshold wizard. As described in the previous section of this support article, there are many ways to arrive at this wizard, depending upon the level at which you would like your adjustments to apply.
Upon opening the alert threshold wizard, you will see any thresholds currently set for the datapoint, in top to bottom order, according to hierarchical level. For example, if you are viewing thresholds set for a datapoint for a single instance, you would see the instance-level thresholds first (if any), followed by the resource group thresholds (if any), followed by the global thresholds. LogicMonitor evaluates the thresholds in the order in which they display, meaning that instance-level thresholds override resource-group-level thresholds, which override global thresholds, as discussed in the previous Determining the Level at Which Datapoint Alert Thresholds Should Be Adjusted section of this support article.
The effective datapoint threshold for the instance shown here is the threshold listed first. Thresholds are also set for the resource group that the instance is a member of, as well as the global DataSource that applies to the instance. But because these thresholds are set at a more broad level, it is the instance-level threshold that LogicMonitor will evaluate to determine whether this instance is in an alert condition.
You can only edit or add thresholds at the level from which you arrived at the alert threshold wizard. For example, if you opened the alert threshold wizard from a global DataSource definition, then only global thresholds are available for adding/editing. If you arrived at the alert threshold wizard from a resource group's configurations, then only resource group thresholds are available for adding/editing.
Once you've arrived at the alert threshold wizard, follow these steps to add or edit the threshold for a datapoint:
- If you are editing an existing threshold, click the arrow to its left to expand its settings. If you are adding a new threshold (either at a different level or at the same level but for a different time frame), click the plus sign icon.
- If you would like the threshold to be effective for a time frame that is more narrow than the default "All Day" time frame, open the dropdown to select a start time. Then choose an end time from the second dropdown that appears.
Note: Multiple sets of thresholds can only exist at the same level if they specify different time frames.
- Select a comparison method. The following methods are available:
- Value. Compares the datapoint value against a threshold
- Delta. Compares the delta between the current and previous datapoint value against a threshold
- NaNDelta Operates the same as delta, but treats NaN values as 0
- Absolute value Compares the absolute value of the datapoint against a threshold
- Absolute delta Compares the absolute value of the delta between the current and previous datapoint values against a threshold
- Absolute NaNDelta Operates the same as absolute delta, but treats NaN values as 0
- Absolute delta% Compares the absolute value of the percent change between the current and previous datapoint values against a threshold
- Select a comparison operator.
- For one or more of the severity levels, specify the value that will trigger that alert severity.
Note: If you add the same threshold value to more than one severity level, the higher severity level takes precedence. For example, if you set both the warning and error severity level thresholds at 100, then a datapoint value of 100 will trigger an error alert.
- Click the Save & Close button.
- You are prompted to enter a note describing the purpose behind the change. As discussed in the Alert Threshold History Log and Reporting section of this support article, this note is added to the threshold history log.
Alert Threshold History Log and Reporting
Threshold History Log
Each time an alert threshold is added or edited from the wizard, a prompt displays requesting details about the change. The information entered into this prompt, along with a timestamp and username, is appended to the Threshold History log, which displays at the bottom of the wizard.
Note: If making alert threshold edits at the global DataSource definition level, the DataSource itself must ultimately be saved (requiring you to save at the wizard level, the datapoint dialog level, and ultimately the DataSource level) in order for the Threshold History log to be updated.
Alerts Thresholds Report
The Alerts Thresholds report provides visibility into alert thresholds across your LogicMonitor platform. It reports on the thresholds in effect across multiple resources, including detailing thresholds that have been overridden and resources for which alerting has been disabled. To learn more about this report, see Alert Thresholds Report.
Next are some examples of alert threshold configurations and how they will apply in the LogicMonitor platform.
Example 1: Global Datapoint Threshold
There are two sets of alert thresholds set at the global level for this datapoint. Each threshold is active during different time windows of the day. Notice how the threshold labels align with the hourly timeline that runs along the bottom of the wizard.
Example 2: Resource Group Datapoint Threshold
Building upon the previous example (example 1), there are three sets of alert thresholds set at the "Windows Server" resource group level for this datapoint. If CPU metrics for any devices in this particular resource group reach 75 between the hours of 5am and 11pm PDT, a warning alert will trigger. For the remaining hours in the day, CPU metrics will need to reach 90 (as defined at the global level) in order for a warning alert to trigger.
Example 3: Instance Datapoint Threshold
Building upon the previous two examples (examples 1 and 2), there are four sets of alert thresholds set at the "WinCPU" instance level for this datapoint. If CPU metrics for this particular instance reach 50 at any point during the day, a warning alert will trigger. Although there are other alert thresholds listed for this instance that pertain to the resource group the instance is a member of and the DataSource that applies to the instance, the instance-level threshold will always be the only one ever evaluated as it is set for "All Day".