Custom Application Monitoring for Go

Sometimes you just have to do it yourself. When managing proprietary applications, there is no suite of utilities available – you will likely have to implement your own solutions. Custom solutions, however, still require monitoring. This blog will demonstrate how TechOps at LogicMonitor has implemented custom monitoring for some of our internal tools.

Back in May, we announced the release of our proprietary Time Series Database (TSDB) for storing your data. We didn’t announce all of the internal tools required to support our new TSDB. Development provided a great TSDB, but not the tools needed to run it in production, so we on the TechOps team set out to build some of the operational infrastructure to backup user data from the new TSDB architecture.

Once we had a working system in place to back up all of your TSDB data, we needed to monitor that system. Without going into detail, I’ll say that our TSDB backup system has several components that need to be monitored, including a metadata database, external storage, backup agents, and a centralized backup scheduler, along with with performance metrics such as network throughput, CPU utilization, memory utilization, and disk performance.

Most of this stuff is a piece of cake to monitor right out of the box using LogicMonitor, but our bespoke backup agents and scheduler were both written in Go, the efficient and scalable open source programming language from Google, and didn’t inherently lend themselves to monitoring. The rest of this blog will cover just how simple it was to expose custom metrics from these Go applications and monitor them with LogicMonitor.

Overview

Go makes it unbelievably simple to create a simple HTTP server inside of any application. This functionality, combined with LogicMonitor’s Webpage data collection method, allowed us to quickly expose custom metrics from our code and start using that data to monitor our infrastructure.

Exposing Application Metrics

The most difficult part of the process is determining exactly which metrics you want to expose from your code. Since every system and application is different, I’ll start with the basics and provide an example of using the Go runtime library to expose performance metrics about the Go runtime itself:

import (
 "encoding/json"
 "log"
 "net/http"
 "runtime"
)

func Performance(w http.ResponseWriter, req *http.Request) {
 results := make(map[string]float32)

 // get number of Goroutines
 // https://golang.org/pkg/runtime/#NumGoroutine 
 numRoutines := runtime.NumGoroutine()
 results["GoRoutines"] = float32(numRoutines)

 // get memory stats
 // https://golang.org/pkg/runtime/#MemStats 
 var memStats runtime.MemStats
 runtime.ReadMemStats(&memStats)

 // bytes allocated and not yet freed
 results["MemAlloc"] = float32(memStats.Alloc) 

 // number of frees
 results["MemFrees"] = float32(memStats.Frees) 
 
 // bytes allocated and not yet freed
 results["MemHeapAlloc"] = float32(memStats.HeapAlloc) 
 
 // bytes in idle spans
 results["MemHeapIdle"] = float32(memStats.HeapIdle) 

 // bytes in non-idle span
 results["MemHeapInUse"] = float32(memStats.HeapInuse) 

 // total number of allocated objects
 results["MemHeapObjects"] = float32(memStats.HeapObjects) 

 // bytes obtained from system
 results["MemHeapSys"] = float32(memStats.HeapSys) 

 // number of mallocs
 results["MemMallocs"] = float32(memStats.Mallocs) 

 // total number of garbage collections
 results["MemNumGc"] = float32(memStats.NumGC) 

 //total time that the garbage collector has paused the program
 results["MemPauseTotalNs"] = float32(memStats.PauseTotalNs) 

 // bytes obtained from system 
 results["MemSys"] = float32(memStats.Sys)

 resp, err := json.Marshal(results)
 if err != nil {
  log.Printf("error: couldn't marshal queue metrics to json")
  w.WriteHeader(http.StatusInternalServerError)
 } else {
  w.Write(resp)
 }
}


This block of code will return a JSON string containing metrics about the Go runtime, exemplified by the following:

{
 "GoRoutines": 56,
 "MemAlloc": 6986048,
 "MemFrees": 950790800,
 "MemHeapAlloc": 6986048,
 "MemHeapIdle": 34209790,
 "MemHeapInUse": 13205504,
 "MemHeapObjects": 33145,
 "MemHeapSys": 47415296,
 "MemMallocs": 950823940,
 "MemNumGc": 142465,
 "MemPauseTotalNs": 40569120000,
 "MemSys": 52869370
}


The metrics you’re able to expose are only limited by your imagination. You’re able to write whatever code you need to grab a given metric from your application, convert it to JSON, and respond. Here’s a basic outline:

import (
 "encoding/json"
 "log"
 "net/http"
)

func FooMetric(w http.ResponseWriter, req *http.Request) {
 // Calculate or retrieve your datapoint here
 results := getFooMetric()

 resp, err := json.Marshal(results)
 if err != nil {
  log.Printf("error: couldn't marshal queue metrics to json")
  w.WriteHeader(http.StatusInternalServerError)
 } else {
  w.Write(resp)
 }
}


Serving Application Metrics

Once you’ve assembled a few functions to compile the metrics you need, it’s trivial to spin up a simple HTTP server and expose the metrics from your app:

import (
 "log"
 "net/http"
)

func StartServer() error {
 // https://golang.org/pkg/net/http 
 // Create a multiplexer to handle request routing
 h := http.NewServeMux()

 // Add resource handlers to route requests
 // The first argument, 'pattern', is used to match a request path 
 // and forward that request to the function specified by the
 // second argument, 'handler'
 // https://golang.org/pkg/net/http/#HandleFunc 
 //
 // In this case, we're mapping the URL 
 // https://localhost:8080/stats/performance 
 // to the function we created above name Performance and mapping
 // https://localhost:8080/stats/foo to the function FooMetric 
 h.HandleFunc("/stats/performance", Performance)
 h.HandleFunc("/stats/foo", FooMetric)

 // Create the HTTP server, passing in desired listener port, 
 // our multiplexer, and some timeout configurations
 // https://golang.org/pkg/net/http/#Server 
 srv := &http.Server{
  Addr: 8080,
  Handler: h,
  ReadTimeout: 10 * time.Second,
  WriteTimeout: 10 * time.Second,
 }


 // Start the HTTP server
 // https://golang.org/pkg/net/http/#Server.ListenAndServe 
 log.Printf("info: Stats server started on localhost" + statsPort)
 log.Fatal(srv.ListenAndServe())
 return nil
}


Bonus

Go also makes it extremely easy to output a real-time stack trace (which isn’t particularly applicable for using an application with LogicMonitor, but too useful not to share), as exemplified in the following:

import (
 "net/http"
 "runtime"
)

func Stacktrace(
 w http.ResponseWriter,
 req *http.Request,
) {
 buf := make([]byte, 1<<16)
 runtime.Stack(buf, true)
 w.Write(buf)
}


That’s it! Now you can serve this function from your HTTP server and view a stacktrace in your browser. Just add  h.HandleFunc(“/stacktrace”, Stacktrace) to your HTTP server handlers.

Putting It All Together

In order to facilitate inserting our code into your existing application, I’ll put everything together in a struct and then demonstrate how to include this HTTP server in your application’s startup.

Here’s our completed struct:

package app

 import (
  "encoding/json"
  "log"
  "net/http"
  "runtime"
 )

 const statsPort = ":8080"

 type StatsServer struct {}

 func s *StatsServer) Performance(
  w http.ResponseWriter, 
  req *http.Request,
 ) {
  results := make(map[string]float32)

  // get number of Goroutines
  // https://golang.org/pkg/runtime/#NumGoroutine 
  numRoutines := runtime.NumGoroutine()
  results["GoRoutines"] = float32(numRoutines)

  // get memory stats
  // https://golang.org/pkg/runtime/#MemStats 
  var memStats runtime.MemStats
  runtime.ReadMemStats(&memStats)

  // bytes allocated and not yet freed
  results["MemAlloc"] = float32(memStats.Alloc) 

  // number of frees
  results["MemFrees"] = float32(memStats.Frees) 
 
  // bytes allocated and not yet freed
  results["MemHeapAlloc"] = float32(memStats.HeapAlloc) 
 
  // bytes in idle spans
  results["MemHeapIdle"] = float32(memStats.HeapIdle) 

  // bytes in non-idle span
  results["MemHeapInUse"] = float32(memStats.HeapInuse) 

  // total number of allocated objects
  results["MemHeapObjects"] = float32(memStats.HeapObjects) 

  // bytes obtained from system
  results["MemHeapSys"] = float32(memStats.HeapSys) 

  // number of mallocs
  results["MemMallocs"] = float32(memStats.Mallocs) 
  // total number of garbage collections
  results["MemNumGc"] = float32(memStats.NumGC) 

  //total time that the garbage collector has paused the program
  results["MemPauseTotalNs"] = float32(memStats.PauseTotalNs) 

  // bytes obtained from system
  results["MemSys"] = float32(memStats.Sys) 

  resp, err := json.Marshal(results)
  if err != nil {
   log.Printf("error: couldn't marshal queue metrics to json")
   w.WriteHeader(http.StatusInternalServerError)
  } else {
   w.Write(resp)
  }
 }

 func (s *StatsServer) Stacktrace(
 w http.ResponseWriter,
 req *http.Request,
 ) {
  buf := make([]byte, 1<<16)
  runtime.Stack(buf, true)

  w.Write(buf)
 }

 func s *StatsServer) StartServer() error {
  // https://golang.org/pkg/net/http 

  // Create a multiplexer to handle request routing
  h := http.NewServeMux()

  // Add resource handlers to route requests
  // The first argument, 'pattern', is used to match a request path 
  // and forward that request to the function specified by the
  // second argument, 'handler'
  // https://golang.org/pkg/net/http/#HandleFunc 
  //
  // In this case, we're mapping the URL 
  // https://localhost:8080/stats/performance 
  // to the function we created above name Performance and mapping
  // https://localhost:8080/stats/foo to the function FooMetric 
  h.HandleFunc("/stats/performance", Performance)
  h.HandleFunc("/stats/foo", FooMetric)

  // Create the HTTP server, passing in desired listener port, 
  // our multiplexer, and some timeout configurations
  // https://golang.org/pkg/net/http/#Server 
  srv := &http.Server{
   Addr: 8080,
   Handler: h,
   ReadTimeout: 10 * time.Second,
   WriteTimeout: 10 * time.Second,
  }

  // Start the HTTP server
  // https://golang.org/pkg/net/http/#Server.ListenAndServe 
  log.Printf("info: Stats server started on localhost" + statsPort)
  log.Fatal(srv.ListenAndServe())
  return nil
 }


Now, including the HTTP server in our app is as simple as adding the lines below to the application’s startup function.

// initialize monitoring metric server
// NOTE: We must start the server as a go function or ListenAndServe 
// will block further code execution
go func() {
 stats := StatsServer{}
 stats.StartServer()
}()


You can now view your metrics by sending HTTP requests to the application.

> curl https://localhost:8080/stats/performance | jq .
{
 "GoRoutines": 56,
 "MemAlloc": 6986048,
 "MemFrees": 950790800,
 "MemHeapAlloc": 6986048,
 "MemHeapIdle": 34209790,
 "MemHeapInUse": 13205504,
 "MemHeapObjects": 33145,
 "MemHeapSys": 47415296,
 "MemMallocs": 950823940,
 "MemNumGc": 142465,
 "MemPauseTotalNs": 40569120000,
 "MemSys": 52869370
}


Now that your app is serving data, it’s time to start monitoring.

Bringing Everything Together Inside LogicMonitor

Consuming your exposed metrics is incredibly simple using a Webpage Datasource.

For example, here’s how we configured the datasource to monitor the performance of our TSDB backup scheduler:

blog_1

The value system.hostname in the Applies-To is configured to match any and all of the servers where you may be running this application.

blog_2

Notice that we’re using the resource endpoint configured in our Go HTTP server.

blog_3

For the final step, here’s an example of adding a JSON datapoint to your datasource:

blog_4

Make sure to update the JSON Path field to match the path to a given datapoint within the JSON response.

That’s it! LogicMonitor is now collecting data from the application. Metrics can be viewed by navigating to the monitored device and locating the datasource.

blog_5

Notice the datapoints corresponding to the runtime metrics exposed in the Go code.

blog_6

Now we can be confident that if, for any reason, backups of customer data are not completing in a timely manner, our Operations team will know about it in a timely manner. After all, even if you are not instrumenting your custom applications, there are still two monitoring systems that will tell you about issues – your customers and your boss. We want to make sure that we know of, and can address, any issue before those two systems trigger.