As we’ve often preached – too many alerts are just as bad as missing alerts. You don’t want your team to become inured to alerts, so they don’t take action on those indicating outages.
For those of you using Campfire as your team collaboration tool, you now have another way to help manage your infrastructure and server monitoring alerts, and ensure every alert is reacted to appropriately. LogicMonitor now integrates with Campfire using the Campfire API.
How does this integration help you react to alerts appropriately, and ensure your teams don’t suffer from alert overload?
One practice we often recommend is to route warning alerts only to a chat room. Warning level alerts can be more numerous than other levels, and because they are often not in need of immediate action, if sent to email, they will lead to inbox clutter, and possibly missing (or ignoring) the important alerts that can indicate a service impacting issue. Routing all alerts to a chat room – where the on call engineers will be hanging out – means that more significant alerts sent to email and pagers are rarer, and more “special”. More severe alerts (Errors, Criticals) are routed not just to the alert room, email, and pagers, but also the room where all engineers generally hang out and discuss issues, so its easy for anyone to jump on an issue, and start discussions with everyone already up to speed on the alerts. Campfire and other chat tools make it easy for alert history to be viewed, simply by scrolling back the room.
Of course, no matter how they are treated, warning alerts should be examined to determine if they are indicators of Bad Things to Come, if the issue is left unchecked. (Or, if they are noise – thresholds should be tuned, or alerting disabled, etc).
LogicMonitor will be rolling out further integrations in the near term (PagerDuty, AutoTask, and ServiceNow), and we already have HipChat and Connectwise integrations. The addition of Campfire is a welcome addition for our users.