888-415-6442

Hosted Monitoring of Network - Servers - Applications - Storage - Cloud

NetDoktor.de Gmbh gets a Virtual SysAdmin

Dan Ackerson, CTO of NetDoktor, had recently ordered a new round of servers for NetDoktor and was positively dreading having to setup Nagios & Munin on them. This is where the fact that he is a “born & raised” developer  really shines through. The configuration of Nagios is simply beyond Dan and his staff. No matter how much time was spent digesting Nagios documentation, they just could not get all the pieces moving correctly. Dan tried to bolt Munin on top of Nagios and eventually had to walk away in frustration. There had to be a better way…

The Competition

Dan had actually outsourced the initial Nagios setup, so of course toyed around with the idea of having the same company do it again on the new servers. But then, why should he have to do this at all?  Weren’t there “all-in-one” products that can do this? Dan looked around and started with Centreon and then Zenoss, but had not really understood SNMP well enough to setup one monitor to cover his disparate network of servers.

“Arrgggh – weren’t there companies out there that did this sort of thing? Why do I have to reinvent the wheel here?”

LogicMonitor to the Rescue

Over the weekend, Dan dug around a little on the LogicMonitor website and figured he would invest time in a trial evaluation. Monday he was contacted by Panoramic Data, LogicMonitor’s European partner, and scheduled a desktop session for that same evening. Dan went through the process of installing a lightweight java agent on a test server and configured SNMP properly. LogicMonitor provided guidance through the trial setup process and within a few minutes, all the basic datapoints were being gathered and graphed. All well and fine, but was it worth the cost?

Soon after the trial setup installation, Dan started to get interesting alert emails – warnings about excess TCP Retransmissions and MySQL query cache prunes:

 www2 has 21.42 query cache prunes per second due to low memory. The Query cache has a hit
 ratio of greater than 50% - so it is likely to benefit more from increasing the cache
 size to alleviate this memory pressure.
 This state started at 2010-07-23 20:36:35 CEST and has been going on for 0h 12m.

“Wow” – not just a plain alert because a given threshold was tripped, but background information and even a suggestion on how to fix it! Suddenly, the light went on – LogicMonitor! Dan now had a virtual system administrator watching his servers and pointing him and his team in the right direction when it noticed something strange. Just yesterday, Dan created his first custom datasources with graphs to keep track of website response time for different pages:
LogicMonitor makes it easy to add custom monitoring

It took about 10 minutes to setup a couple of these graphs – the first step to easily gather and plot relevant business data. A possible next step would be how many people have logged in within the past 5 minutes? How many people have posted articles or questions? And the “holy-grail”, how do response times (and load) compare when plotted against these other metrics?

Naturally, Dan had second thoughts about pushing such a core business value like server metrics out to a SaaS provider. But are the metrics themselves a core value? Or is it the visualization and business decisions that are made after comprehending these metrics  the real value? Dan confirmed he is going with the latter and is extremely happy to not have to worry about how to setup his own metrics system.

About NetDoktor.de Gbmh

NetDoktor.de is based in Munich, Germany, and provides independent and comprehensive information on the subject of health and medicine. To learn more, please visit: http://www.netdoktor.de