Is Gmail actually down or is it just me? How about HipChat and Salesforce? These are questions that I know my systems admin is tired of hearing. Our company relies on many cloud service providers to support our daily operations. So when my colleagues and I can’t access HipChat for on-the-fly communication or Salesforce to track client information, our TechOps team is often left scrambling to identify the source of the problem before they are bombarded with complaints.
As cloud service providers become increasingly ubiquitous and integral parts of companies’ systems, we at LogicMonitor wanted to alleviate some of the aggravation associated with spotty service availability. To do this, we began offering Internal Service Checks in order to quickly identify performance issues in companies’ applications and websites. These service checks work as follows: the Collectors already installed in each of your offices or datacenters act as automated end-users by regularly attempting to access a designated site or application using authentication credentials (i.e. username, password, and endpoint domain) you provide. If a Collector is unable to successfully access the service, you are immediately notified- ideally before any of your actual end users become aware that something is wrong.
It seems these Internal Service Checks would be perfectly suited to proactively address all our application availability struggles, right? Well, almost. Our out-of-the-box configuration for Internal Service Checks are great for websites/applications using Basic or NTLM authentication. Many of our services, however, have a more complex form-based method that require the presence of dynamic tokens. This presented the LogicMonitor team with a problem: how do we run an automated service check in which one of the authentication measures is a dynamic value created at the moment one issues a validation request? In keeping with LogicMonitor’s one size fits none tradition, we wanted to provide a highly customizable framework for service checks that could be tailor-fit for each of your internal applications.
We met this challenge head-on with our newly released Script Internal Service Checks, which allow you to write Groovy-based Request/Response scripts that can capture the value of dynamic tokens and pass them into authentication. Using a routinely scheduled Script Internal Service Check, our TechOps team no longer has to worry about being caught off guard by availability issues and, when they occur, can proactively post service notices to my colleagues and I.
Using HipChat as an example, let’s dive into the specific structure of our scripted service checks. The HipChat service check consists of two steps, each containing a Request and Response script, with the latter telling LogicMonitor how to parse the HTTP response gathered by the former. This basic format is typical of any Script Internal Service Check that will be run in our platform.
Step One, Request:
import static com.logicmonitor.service.groovyapi.PostDataType.*; import static com.logicmonitor.service.groovyapi.StatusCode.*; import com.logicmonitor.service.groovyapi.StatusCode; import com.logicmonitor.service.groovyapi.AuthType; import com.logicmonitor.service.groovyapi.LMRequest; LMRequest request = new LMRequest(); request.useHttp1_1() .followRedirect(true) .needFullpageLoad(false) .get(); return LMHttpClient.request(request);
The purpose of Step One’s Request script (shown above) is quite simple: use the Groovy API to load the full HTTP response of https://hipchat.com/sign_in. The HTTP response will include our XSRF token in a format resembling the following:
<form action="https://www.hipchat.com/sign_in" method="post" name="signin" class="aui top-label aui-form-supersize signupForm"> <input type="hidden" name="xsrf_token" value="1008991133" />
Using the successive Response script, we will capture the token’s value using the setContext() API command.
Step One, Response:
import static com.logicmonitor.service.groovyapi.StatusCode.*; import com.logicmonitor.service.groovyapi.StatusCode; StatusCode status = STATUS_OK; String body = LMResponse.getBody(); result = (body =~ /xsrf_token(.+)\/\>/); result = result.replace('"',''); result = result.replace(' ',''); result = result.replace('value=',''); LMResponse.setContext("token", result); return status;
The setContext() command creates a key-value pair in which the key is defined as “token” and the value will be defined as “result,” which is transitively defined as “result = (body =~ /xsrf_token(.+)\/\>/)” in the HTTP response.
Before we can enter the token into our authentication request, we have to reformat it using a series of result.replace commands in order to remove unnecessary spaces and characters surrounding the XSRF token. This will allow us to isolate the token and pass its value into authentication.
After Step One has collected and properly parsed the HTTP response from HipChat’s login page, we can begin the authentication process. This brings us to Step Two.
Step Two, Request:
import static com.logicmonitor.service.groovyapi.PostDataType.*; import static com.logicmonitor.service.groovyapi.StatusCode.*; import com.logicmonitor.service.groovyapi.StatusCode; import com.logicmonitor.service.groovyapi.AuthType; import com.logicmonitor.service.groovyapi.LMRequest; password = '##hipchat.pass##'; token = LMHttpClient.getContext("token"); LMRequest request = new LMRequest(); request.useHttp1_1() .followRedirect(true) .needFullpageLoad(false) .addHeader("Content-Type", "application/x-www-form-urlencoded") .post(X_WWW_FORM_URLENCODED, "signin=Log+in&xsrf_token="+token+"&password="+password+"&email=sarah.tery%40logicmonitor.com"); return LMHttpClient.request(request);
The primary purpose of Step Two’s Request script is to login to HipChat using our three authentication measures: username, password, and XSRF token. As a best practice, you will always want to store your passwords as device/service-level properties so that you can dynamically populate this information into scripts using LogicMonitor tokens. In this way, we defined our HipChat password as the value of our hipchat.pass service property. Then, in order to retrieve the XSRF token stored in our previous step, we use a getContext() command, which will populate with the “Token” key established in Step One’s setContext() command.
With our username and token set, we can post these values to hipchat.com/login_password with our username ([email protected]) established as a URL argument:
At this point, we have successfully authenticated into HipChat! The HTTP response from our login page will be evaluated by Step Two’s Response stage.
Step Two, Response:
In our final stage, we moved away from scripting and took advantage of our standard Settings configuration to evaluate the previous HTTP response. It’s worth noting that you do not have to use a script for each of your Service Check’s stages if our Settings configuration suffices- you are free to mix-and-match!
We configured this step to identify the presence of “Welcome” in the HTTP response’s body. This indicates that we successfully authenticated into HipChat and, thus, the application is available.
In our test run shown below, you can see the HTTP response contains the string <h1>Welcome, Sarah</h1>. HipChat is up and running! In less ideal scenarios when an issue is detected, an alert would be displayed on our TechOps’ team’s dedicated Services Dashboard.
Using a script to execute your service checks means we can monitor any of your cloud-based applications, regardless of their authentication measure. At LogicMonitor, this has resulted in dramatic improvements for our time-to-resolution and internal processes when addressing our applications’ availability. As always, we would love to hear how you are using Internal Service Checks to save time and resources, so be sure to click the “Feedback” button in your LM account to share your thoughts and use cases.