Is Linux disk utilization on your monitoring dashboard?

Recently we rolled out a new release of LogicMonitor. Among the many improvements and fixes that users saw, there were also some backend changes to the Linux systems that store monitoring data. The rollout went smooth, no alerts were triggered - but it was pretty easy to see that something had changed:

Duration: 2 minutes

Published: August 7, 2011

Is Linux disk utilization on your monitoring dashboard?

No alerts triggered, as the drive utilization was just below a warning threshold, and the IO completion time, while higher, was still plenty fast (especially on reads, which is what we alert on – writes are done asynchronously in the background by our app, so don’t impact user experience – but read time can.)

So while there is no immediate issue here, the change (a more aggressive cache management algorithm) clearly impacts our scalability for a given set of hardware. So it’s something we’re going to look at again.

The point I want to bring out, though, is that you need to eyeball graphs, regularly. Especially after releases. The fact that alerts did not trigger does not mean all is well – it just may mean you don’t have alerts on the right things. Create dashboards with your key metrics plotted (whether they be OS, application or database metrics), scan them regularly, and you’ll be able to easily spot anomalies. And then decide how to address them. (We create several dashboards, focusing on different classes of systems -high level, databases, storage, etc)

As a side note: why doesn’t LogicMonitor have automated anomaly detection that can “eyeball” such metrics for you and look for “unusual” changes? Well – because it’s hard. 🙂 Actually, it’s not that hard to make an anomaly detector – what is hard is making one that can sort through and detect meaningful changes from noise, and sending too many anomaly notifications. This is an area we’re still working on.

Blogs

Check out our latest resources

See only what you need, right when you need it. Immediate actionable alerts with our dynamic topology and out-of-the-box AIOps capabilities.

View all blog posts

best practices

Platform

Infrastructure

AIOps & Edwin AI

Cloud & Multi-Cloud

Digital Experience

Logs

Solutions

Business Outcome

Role

Industry

Resources

By Resources

By Topic

Learn the Platform

Company

About Us

Is Linux disk utilization on your monitoring dashboard?

In this article

SUBSCRIBE

Subscribe to the Monthly Newsletter

SUBSCRIBE

In this article

Check out our latest resources

Why CPU load should not (usually) be a critical alert.

Active/Active or N+1?

Are free monitoring tools like Nagios really free?