Alert Rate Limiting

What is alert throttling?

Also known as rate limiting, this sets the the maximum number of alerts that can be transmitted to any given stage inside of an escalation chain for the period defined. Defined on an escalation chain, this is governed by the Rate Limit Alerts field, which is the maximum number of alerts that are allowed to be delivered, and the Rate Limit Period, which is the time during which the number of alerts specified by the Rate Limit Alerts field can be delivered.

If the number of alerts delivered to the chain’s initial stage exceeds the rate limit, then a throttle message is sent to the individuals assigned to that stage. The message states that the number of alerts has exceeded the throttling level. From this point forward, alerts will be escalated to subsequent stages in accordance with your chain’s configuration. Throttle messages, however, will not be escalated and will continue to be sent to the first stage.

Where is this enabled?

Rate limiting is defined per escalation chain, by navigating to Settings -> Escalation Chains -> Manage.

What does alert throttling not do?

A common misconception is that rate limits affect instance-level alerts only. In other words, if you were to set a rate limit of 1 for a rate limit period of 30 minutes (1/30), then you’d will only receive 1 alert every half hour about the offending instance. A WinCPU- alert for high usage, for example, would expect to be generate an alert only twice in one hour.

What this really means is that alerting has been limited for all hosts currently using that alert rule /escalation chain combination. According to the alert rule definition, this may include multiple datasources. Using the above example, you’d get one real alert, then every alert generated afterwards would trigger rate limiting. This would cause missed delivery of alerts.

In this case, consider instead adjusting the trigger interval for an individual datapoint rather than limiting all alerts for hosts.

Another method to decrease the frequency of the number of messages received adjust the escalation interval defined on the Alert Rule. For example, increasing it to 60 minutes would allow for more time to pass before escalating to the next stage, while also not limiting the flow of other alerts.

Example configurations

Rate Limit Period (mins) Rate Limit (alerts) Result
20 10 Maximum of 10 alerts allowed every 20 minutes. This is the default and our suggested best practice
30 1 Maximum of 1 alert allowed every 30 minutes. In the example above, this is usually not ideal.
60 5 Maximum 5 alerts every 60 minutes.