IN THIS ARTICLE:
- Introduction to Alert Rules
- Adding or Editing an Alert Rule
- Alert Rule Strategies
- Example Alert Rule Setup
Introduction to Alert Rules
An incoming alert is filtered through all rules, in priority order (starting with the lowest number), until it matches a rule's filters based on alert level, resource attributes (name or group or property), and LogicModule/datapoint attributes. Once a match occurs, rule processing stops and the alert is routed to the specified escalation chain. It will proceed through the stages of the escalation chain until it is acknowledged or cleared.
If an alert does not match an alert rule, the alert will not be routed and will reside in the LogicMonitor interface only.
Adding or Editing an Alert Rule
You can add, edit or delete alert rules from Settings | Alert Rules in your account. As shown (and discussed) next, there are several settings that must be established in order to configure alert rules. These settings determine which alerts the alert rule will apply to, as well as how the alert will be routed and managed once the alert rule is applied.
In the Name field, enter a name for the alert rule. This name must be unique among all other alert rules.
In the Alert Priority field, enter a numeric priority value to determine the order in which the alert rule will be evaluated (as compared to all other alert rules). A value of "1" represents the highest priority; values can consist of up to ten digits.
Once a triggered alert matches an alert rule, no further alert rules will be evaluated. We recommend numbering your original set of alert priorities in intervals (e.g. intervals of 10, 20, 50, etc.) so that new alert rules can easily be inserted without having to renumber existing alert rules.
Note: This field defaults to "100" when creating a new rule, but do not leave this default value in place or you will end up with the undesirable situation of multiple rules having the same priority. If two or more alert rules have the same priority, the selection of which rule will be applied is nondeterministic.
The levels available from the Level field's drop-down menu correspond to the severity levels you assigned when creating alert conditions. Only alerts with severity levels that correspond to the level specified here will match this alert rule.
In the Group field, specify one or more device/website group(s) the resource or website must belong to in order for an alert to match this rule. Glob expressions and wildcard matching are supported in this field. If you do not want to filter by group, you may leave this field blank.
In the Resource/Website field, specify one or more resources (e.g. device, cloud resource) and/or website(s) the alert must be triggered by in order for this alert rule to match. Note that group-level cluster alerts use a pseudo-device "cluster." Glob expressions and wildcard matching are supported here. If you do not want to filter by resource or website, you may leave this list empty.
Resource Property Filters
Click on the + icon located under the "Resource Property Filters" heading to specify one or more property values that a resource or website must possess in order for this alert rule to match. Property names must be entered in their entirety with appropriate case sensitivity; property values support Glob expressions and wildcard matching.
As you enter a property name, LogicMonitor will attempt to auto-complete matching results for you. However, due to current limitations, only device properties will show in search results. This does not mean that the property filter cannot be applied to websites, only that website property names must be manually entered in their entireties.
Note: You may only specify a maximum of five property filters; multiple property filters are joined by an AND logical operator.
Note: If you choose to filter alert rules by one or more properties, but have left the Group and/or Resource/Website fields blank, a wildcard value (*) will be automatically appended to those fields to indicate that alerts on any resources with matching properties will be considered by this rule.
In the LogicModule field, specify which LogicModule(s) the alert must be triggered by in order for this alert rule to match. Glob expressions and wildcard matching are supported in this field. If you do not want to filter by LogicModule, leave the default wildcard (i.e. asterisk) in place to indicate all LogicModules.
In the Instance field, specify which instance(s) the alert must be triggered by in order for this alert rule to match. Glob expressions and wildcard matching are supported in this field. If you do not want to filter by instance, enter an asterisk (*) into the field in order to indicate all instances.
If you are using a glob expression to match instances, be aware that instances are identified by DataSource-Instance name, not by instance name alone. For example, if an instance of the Interfaces (64 bit) DataSource appears in the Resources tree as "enp2s0", in an alert rule it is identified as "snmp64_If-enp2s0". Therefore, you would need to use the glob expression "*enp*", not "enp*", to match such instances. If you have any doubt about what the DataSource-Instance name is for a given instance, select the instance in the Resources tree and look at the title of details pane.
Note: Any instance filters established here are ignored by EventSource and JobMonitor alerts, as they are not tied to instances, only devices.
In the Datapoint field, specify which datapoint(s) the alert must be triggered by in order for this alert rule to match. Glob expressions and wildcard matching are supported in this field. If you do not want to filter by datapoint, leave the default wildcard (i.e. asterisk) in place to indicate all datapoints.
Note: Any datapoint filters established here are ignored by EventSource and JobMonitor alerts, as these types of alerts are not triggered by datapoint conditions.
Send notification when alerts clear
If the Send notification when alerts clear option is checked, anyone who received the initial alert notification will also receive a clear notification.
Send status notifications for Acknowledge or SDT
If the Send status notifications for Acknowledge or SDT option is checked, anyone who received the initial alert notification will also receive notification of an acknowledgment or scheduled down time.
Escalation Interval (min)
After an alert notification is sent to a stage within an escalation chain, the value entered into the Escalation Interval (min) field defines the amount of time that should elapse before an alert will be escalated to the next stage. Escalation stops when the alert is acknowledged or cleared.
If there is only one stage in the escalation chain, the alert notification will be repeatedly sent to that stage until it is acknowledged or cleared. Similarly, if the notification reaches the end of a multi-stage escalation chain, the alert will be repeatedly sent to the final stage until it is acknowledged or cleared.
Note: An escalation interval of "0" will route the alert notification to the first stage of the escalation chain just once and will not be resent, unless you manually escalate that alert.
From the Escalation Chain field's drop-down menu, select the escalation chain to which alerts matching this rule will be sent. (For more information on escalation chains, see Escalation Chains).
Alert Rule Strategies
We recommend that you configure your alert rules based on your environment. When adding/editing alert rules, we recommend that you keep the following guidelines in mind:
1. You don't have to route all alerts
Alert routing works well for alerts that require immediate attention, such as critical (and maybe error) level alerts. Some alerts don't require immediate attention, such as warning alerts, and as such are better viewed in reports. Instead of having "catch all" rules that are sending every single alert to a certain group of people, you can set up your alert rules such that only specific alerts are going to be routed. Make sure, however, that you’re not ignoring any alerts; you should still review alerts that are not being routed in a report.
If you do want to route all alerts, consider creating a rule with an escalation interval of "0" just for warning alerts. This will filter out all warning alerts and send them only once to the first stage of the specified escalation chain. Then you can set up other rules that route only error and critical alerts to people within your organization.
2. Give specific rules higher priorities
You should organize your rules so that specific rules have higher priorities (lower numbers), and general rules have lower priorities (higher numbers). This way, any alerts that don't meet the criteria of your specific rules will be caught by the general rules.
3. Index your priorities
For a large deployment, you can index your priorities to correspond to groups within your organization. For example, you could set dev related rules to have priorities in the 100s, production related rules to have priorities in the 200s, and QA related rules to have priorities in the 300s. If you make sure that every rule is set to only match alerts from a specific device group, each department can work within their range of priorities to customize routing for their group – without affecting alerts for any other department.
4. Leave large gaps between priority levels
It is really important to leave large gaps between priority levels. Otherwise, if your priority levels are too close together sequentially, you'll find yourself having to edit the priorities for your existing rules to make room for new rules.
5. Test alert rules
It is generally a good idea to test alert delivery, to ensure that they are routing your alerts as intended.
Example Alert Rule Setup
The following screenshot depicts a sample set of alert rules. Using the settings displayed here for each alert rule, we're going to walk you through how they work in combination with one another.
The highest priority rule shown in the above screenshot is the "Production Warning Alerts" rule. It filters out all alerts with a severity level of "Warn" in the child groups under production. This rule will post alert notifications to a HipChat room every 30 minutes, until the alert is acknowledged or cleared.
The second highest priority rule in this account is named "Production Database Alerts." It will only match alerts on DataSources that have the term "SQL" somewhere within the name for devices in the child groups under the servers group. Since the "Warn" alerts are filtered out by the first rule, this rule is only going to pick up database alerts with severity levels of "Error" or "Critical." The stage one recipients of the "DB Admins" escalation chain will be notified first, and if they do not acknowledge the alert and it doesn't clear within 15 minutes, it will be sent to the stage two recipients. Again, after 15 minutes, if nobody has acknowledged the alert and it hasn't cleared, the alert escalates to stage three. But since there aren't any recipients in stage three, nobody will be notified. Since alert notifications will repeatedly be sent to stage three until the alert is acknowledged or cleared, having an empty last stage is essentially ensuring that nobody will be notified after the alert escalates past stage two.
Note: If you're sending alert notifications to a ticketing system in one of your stages, ensure that you have either a zero resend interval or a subsequent stage with a different delivery method. Otherwise, if you are sending the same alert more than once to the ticketing system, you will end up with multiple tickets for the same condition.
The next highest priority rule is the "Production Server Alerts" rule. It will match any alerts with a severity level of "Error" or "Critical" for any resource in any child group under the servers group. Since this rule has a lower priority (i.e. higher number) than the "Production Database Alerts" rule, any error or critical alerts that don't originate from a SQL DataSource are going to match this rule instead. In other words, this alert rule will catch the overflow—anything that is not going to the database team will be picked up by the server team.
The fourth and final rule shown in the above screenshot will route all alerts with a severity level of "Error" or "Critical" for resources in the child groups of the network group. Again, all alerts with a severity level of "Warn" have already been filtered out so this rule is going to catch error and critical alerts that aren't already being routed to the database or server teams.