Simple ways to be confident in automated server and application deployments.

Sample SAT question: xUnit is to Continuous Integration as what is to automated server deployments?

We’ve been going through lots of growth here at LogicMonitor. Part of growth means firing up new servers to deal with more customers, but we also have been adding a variety of new services: proxies that allow our customers to route around Internet issues that BGP doesn’t catch; servers that test performance and reachability of customers sites from various locations, and so on.  All of which means spinning up new servers: sometimes lots of times, in QA, staging and development environments.

As old hands in running datacenter operations, we have long adhered to the tenet of not trusting people – including ourselves. People make mistakes, and can’t remember things they did to make things work. So all our servers and applications are deployed by automated tools. We happen to use Puppet, but collectively we’ve worked with cfengine, chef, and even Rightscripts.

So, for us to bring up a new server – no problem. It’s scripted, repeatable, takes no time. But how about splitting the functions of what was one server into several? And how do we know that the servers being deployed are set up correctly, if there are changes and updates?

As it was explained to a relatively new member of the ever growing LogicMonitor team, who came from a more development than ops background – server monitoring is very analogous to the automated tests of a build in a continuous integration environment.

Just as continuous integration requires self-testing code to check a large part of the code-base for bugs, any automated server deployment system should use monitoring to test the servers for successful deployment of applications, and that the applications are functioning as they should.  If you spin up a new server for a particular class of application – everything should work correctly, given it’s all automated by Puppet. But it sure is nice to have automated validation that this is true – especially after any changes to the Puppet manifests.

What is needed for this?

  • ideally, test-driven-deployment. Write the monitoring tests first, so you’ll know when your application deployment scripts pass. (e.g. look for tomcat, with JMX on this port; check that the available heap is at least 4G; check that URL X responds to a request in this way, etc).
  • have your server deployments integrated with your monitoring (here’s a video on how you can configure monitoring from directly within Puppet).  Whenever a server boots, have it register itself with the monitoring to ensure it is monitored. Bonus points for tagging or grouping itself into the right application categories, and right stage of production (dev, QA, production,etc) and so acquiring the correct thresholds and escalation policies.
  • If you ever have performance or availability issues in your infrastructure or applications – never regard the issue as closed unless your monitoring detected the issue and alerted you about it appropriately.  This way you’ll have an increasing set of tests in your monitoring that gives more and more complete coverage.

That’s it.  Simple principles that can greatly increase the confidence level in automated server and application deployments.

The answer to the opening question? Good monitoring.