Guidelines for Responding to Alert Notifications
Identifying the Issue
Before you can take appropriate steps to resolve and respond to an alert notification, you must first identify the issue causing the alert. If you have your LogicMonitor dashboards and alerting set up strategically, you'll likely know where to find the problem because you know what checks are being run and how they work.
The alert notifications you receive don’t always give you the whole picture. You should use alert notifications as an indication that you need to log into your LogicMonitor account and start investigating the component of your application that you got alerted for as well as the health of the components that you didn’t get alerted for. A great place to start investigating are dashboards. Dashboards can provide you with an overview of your application - and if you set them up right you’ll be able to find the issue quickly.
You should think about the alert notifications you receive in the context of your application, and use the structure of your application to decide where to go and solve the problem. One way to do this is to design your dashboards to reflect your application. If you format your dashboard to clearly show the status and data for all of the different monitored components of your infrastructure, then any dependencies the components have shouldn’t matter.
Once you locate which component(s) are causing problems, you can drill down on the problem and start designing a solution.
Responding to an Alert Notification
You can respond to an alert notification from within your LogicMonitor account, or from the email or text message that notified you. While the response method is subject to personal preference, there are guidelines for the type of response you should provide, depending upon whether you can resolve the issue at hand.
LogicMonitor supports three response types. The appropriate use of each is discussed in the following sections:
You should acknowledge an alert when you believe that you can resolve the problem. Resolving the problem includes fixing whatever is causing the problem and then taking action to ensure that the alert does not recur, if necessary. Actions to suppress further alerts can include:
- Adjusting alert thresholds.
- Disabling alerting or data collection.
- Scheduling down time to cover periods of expected recurring maintenance.
- Eliminating the alerting DataSource instance from discovery, or creating a cloned DataSource to discover/classify it with different alert thresholds. If there is a set of instances that should not be discovered (e.g. NFS mounted filesystems on Linux hosts) or for which you require different thresholds than other instances (e.g. QA VIPs on load balancers), you can achieve that with Active Discovery filtering.
Note: Acknowledgement of alerts applies to the same event if the severity drops but does not clear. For example, if you acknowledge a disk usage alert of level error and free up some space so that the alert level drops to warning, the warning alert will already be considered acknowledged. However, acknowledgements of alerts at a lower level do not affect the escalation of alerts if severity increases. For example, acknowledging a disk usage error alert will not affect the escalation of the critical alert if the drive continues to fill.
Schedule Down Time (SDT)
When you schedule down time, you are suppressing all alert notifications for the designated instance, DataSource, or device for the duration of the configured SDT period. This is in contrast to acknowledging an alert, which only suppresses further notifications of that particular alert.
You should place an instance, DataSource, or device in SDT if:
- You forgot to proactively schedule down time.
- A solution is in the works, but you don't want to continue receiving notifications while the issue is being addressed. For example, if you are receiving an alert that a server is out of memory, and you have more memory currently on order, you may initiate an SDT response type to avoid being repeatedly notified of the alert.
Note: When responding with SDT to an EventSource alert, the entire EventSource is placed into SDT, not just the individual event ID associated with the alert condition.
You should escalate the alert to the next person in the escalation chain if an SDT isn't appropriate and you either don't know what the issue is, don't have time to identify and/or resolve the issue, or don't know how to resolve the issue. Note that even if the escalation interval for the matching alert rule is set to zero, the alert will still escalate.