Using Open Source Tools to Push Metrics into LogicMonitor

LogicMonitor Opinion post
Share Post

Using Open Source Tools to Push Metrics into LogicMonitor
Ever walk into a corner market, push on the door and find it won’t open? You look down at the handle and are reminded by a sign on the door that you have to “pull” to open it?

The LogicMonitor platform uses an agentless collector to pull metrics from thousands of devices and resources into a unified monitoring view (no agents required). We currently offer more than 2,000 LogicModules out-of-the-box that gather metrics from all kinds of systems using many different protocols. But what about metrics that we can’t pick-up or ‘pull’ using the Collector?

Sometimes you want to push metrics into monitoring with solutions like StatsD, Cisco Telemetry, or even generic metrics for solutions like serverless applications. That comes with some challenges. For example, how do you know which device those metrics are associated with? How do you determine if it’s multi-instance or not? How do we handle multiple metric formats so users don’t have to adapt to our LM-specific format in order to send us metrics?

The LogicMonitor Integrations team was recently given the challenge to come up with a way to consume push metrics using industry-standard open-source solutions. As opposed to forking an existing industry-standard solution and customizing it, or taking months to develop a full-blown set of LogicMonitor-specific push metrics designs, what solutions could we find in the market today that would allow us to capture push some metrics today?

The good news: we have something to share and it’s pretty cool.

OpenMetrics + Groovy

The solution we built uses a Groovy Script DataSource to gather OpenMetrics format (aka Prometheus exposition format) metrics using a Collector. OpenMetrics has been adopted by the Cloud Native Computing foundation, and is in use by a large list of projects.

We wanted to get hands-on in this blog, so we’re going to look specifically at using the solution with the open-source server agent, Telegraf. Telegraf has many different input plugins that allow it to receive metrics (i.e. allow metrics to be pushed to it).  Those metrics can then be aggregated (if desired) and presented in other formats (also provided as plugins from Telegraf).

For the lab example described below, we are pushing some generic InfluxDB metrics into Telegraf and then extracting them with the Prometheus Client plugin. The Prometheus Client plugin on Telegraf is actually a web endpoint that we can then use to collect the data with a LogicMonitor Collector.

We also took it a step further and made the lab environment easy for you to set up and test in your environment by using Docker to streamline the setup.

Pre-requisites

  • Install Docker on a machine in your environment
  • Have admin access to your LogicMonitor portal


Get the Files

Download the telegraf.conf file and import the two LogicModules below with their respective locators. You can also find them in the DataSource repository through the standard upgrade flow.

File Descriptions:

  • telegraf.conf: This is a pre-configured telegraf config file with the InfluxDB input method on port 8186, StatsD input plugin configured to accept data on port 8215 and output metrics to the Prometheus Client at an HTTP endpoint of http://<telegrafhost>:9273/metrics.  Also enabled by default is the BasicStats aggregator to get count, min, max, mean, s2, sum and stdDev of all custom metrics on a 1-minute interval.
  • OpenMetrics_Template – GMTRD4: The DataSource will poll a prometheus metric endpoint and pull configured metrics into LogicMonitor datapoints.
  • OpenMetrics_Template_MultiInstance – NXE4GM: This is a secondary version of the OpenMetrics_Template DataSource that has capabilities to support a Multi-Instance output and utilizes the BatchScript collection method to reduce calls against the telegraf endpoint (it is not used in the example lab, but you may have a need for in your future use-cases).


Run the Telegraf Container

Navigate to the folder where you saved the telegraf.conf file.

Use the Docker CLI command below to start the Docker container.  

Also, make a note of the IP address of the device where this is running. We’ll need it in a later step.

Mac/Linux Terminal:

docker run \
  --name telegraf \
  -p 8186:8186 \
  -p 9273:9273 \
  -p 8125:8125 \
  -p 8125:8125/udp \
  -v $(PWD)/telegraf.conf:/etc/telegraf/telegraf.conf:ro \
  telegraf;

Windows Powershell Terminal:

docker run \
  --name telegraf \
  -p 8186:8186 \
  -p 9273:9273 \
  -p 8125:8125 \
  -p 8125:8125/udp \
  -v $pwd\telegraf.conf:/etc/telegraf/telegraf.conf:ro \
  telegraf;


Confirm the Telegraf Config

The following commands will send a metric called “foo_custom” with a value of a random number between 1 and 100 every second for 86400 seconds.

To test the telegraf container open a 2nd terminal window and issue the following Metric Generator command to send it some mock data:

Mac/Linux Terminal: 

for x in {1..86400}; do i=$((( RANDOM % 100 ) + 1));echo $x "-" $i;curl --silent -i -XPOST 'http://influxdb:[email protected]:8186/write' --data-binary 'foo_custom,src=hostname value='$i > /dev/null; sleep 1; done

Windows PowerShell:

for ($x=1; $x -in 1..86400; $x++) {
    $credPair = "influxdb:passw0rd"
    $encodedCredentials = [System.Convert]::ToBase64String([System.Text.Encoding]::ASCII.GetBytes($credPair))
    $headers = @{ Authorization = "Basic $encodedCredentials" }
    $i = ((( Get-Random -Maximum 99 ) + 1))
    Write-Output "$x - $i"
    $responseData = Invoke-WebRequest -Uri 'http://localhost:8186/write' -Method POST -Headers $headers -Body "foo_custom,src=hostname value=$i"
    Start-Sleep -Seconds 1
}

With that command running navigate to the following url from the machine where the telegraf container is running:

http://localhost:9273/metrics
User Name: prometheus
Password: passw0rd

(If you want to see it from somewhere else replace 127.0.0.1 with the IP addresses of the telegraf container host)

At the top of the page, there should be some metrics for `foo_custom`.  The aggregation metrics will come in after the first 60 seconds.

A Telegraf collected metric
If you see those metrics then things are working!  Let’s setup LogicMonitor to gather those metrics. Leave the Metric Generator command running and let’s monitor it with the DataSources.


Import OpenMetrics_Template DataSource to gather metrics.

  • Open LogicMonitor and navigate to Settings | DataSources
  • Click Add | From LogicModule Exchange and import the OpenMetrics_Template module with the locator code GMTRD4:

Importing from an XML file in LogicMonitor


Gathering OpenMetrics from Telegraf

If the device where telegraf is running is not already in LogicMonitor, add a new Device using Expert Mode:

  • Navigate to Resources | Add | One Device | Expert
  • Use the following value to add a new Device to LogicMonitor
    • IP Address/DNS Name: <telegraf host IP from previous step>
    • Name: telegraf
    • Collector: Select a Collector that has the ability to reach the telegraf host.

Add the following Custom Properties to the device new (or existing) telegraf host device:

    • openmetrics.client=<telegraf host IP from previous step>
    • openmetrics.user=prometheus
    • openmetrics.pass=passw0rd  (Password with a zero for ‘o’)

The imported DataSource should match the ‘telegraf’ device that was added and start gathering metrics. 

Monitoring Prometheus_Groovy as a resource within LogicMonitor

Sending metrics from other systems:

To start pushing metrics from your code you can leverage one of the many external client-side libraries available. Here is a Python code example using a library for StatsD (conveniently called “statsd”) :

import statsd
c = statsd.StatsClient('my.telegraf.agent.ip', 8125)
c.incr('user.logins,service=payroll,region=us-west')  # Increment counter, with service and region tags
c.timing('process_stats.timed', 320)  # Record a 320ms 'process_stats.timed'.

Other systems like Cisco Telemetry may require new input plugins enabled and would need to be pointed towards the telegraf host to deliver metrics.

Troubleshooting

Note: If you installed on a machine within the target environment you should make sure that the firewall allows the necessary connections to push metrics in.

Also, make sure that the port you chose for Prometheus plugin within the Telegraf agent is open and that the LogicMonitor collector can access that port:

Default ports

  • StatsD: 8125 (udp)
  • InfluxDB input: 8186 (http)
  • OpenMetrics: 9273 (http)

Conclusion

This is a useful way to expand the ways you can aggregate metrics in the LogicMonitor platform for a unified view and enjoy other benefits, including the ability to store the data for up to two years. If you need to push metrics or already have OpenMetrics-formatted metrics to monitor (e.g. telemetry stream from Istio, Kubernetes side-car, Cisco or others), use some of these ideas and the DataSource provided to get started. There is no reason to fracture your monitoring solution. LogicMonitor can serve as the unified target for all metrics in your environments.