Orchestrate Your Monitoring With LogicMonitor and Ansible

LogicMonitor Best Practices Blog

screen-shot-2016-11-17-at-10-53-37-am
Back in January, we announced the release of our
LogicMonitor Community Ansible Module. Today, we’re proud to announce that with the recent release of Ansible version 2.2, the LogicMonitor module will be included in all official distributions of Ansible! To celebrate and help get everyone up to speed, this blog is going to share some use cases and examples for using the module.

Use Cases

Below I’ll be presenting a few real-world use cases and accompanying playbooks for using the LogicMonitor module for Ansible (full information about integrating LogicMonitor and Ansible can be found here, in addition to the official module documentation which can be found here).

Since every environment is different, these playbooks aren’t directly copy and pasteable, but they will provide great examples of different ways of using the LogicMonitor module as well as a variety of different syntax options.

Before getting started, I’ll provide an example inventory to be used with these use cases and a bit of environment setup information. The example inventory just contains some mock groups and variables used in the playbooks – you won’t need to update your existing inventories to use our module. Setting up the environment for the examples consists of a quick and easy way to pass your LogicMonitor credentials to the module. Please continue using your own internal best practices for handling credentials and Ansible.

Finally, while I will be providing an overview of each use case and a summary of the valuable information contained therein, the nitty gritty technical details and explanations are contained as comments within the playbooks themselves.

Inventory

[linux_hosts:children]
collector_hosts
application_hosts

[collector_hosts]
collector01.logicmonitor.com
collector02.logicmonitor.com

[application_hosts:children]
application_foo_hosts
application_bar_hosts

[application_foo_hosts]
foo01.app.logicmonitor.com lm_display_name=foo1
foo02.app.logicmonitor.com lm_display_name=foo2
foo03.app.logicmonitor.com lm_display_name=foo3

[application_bar_hosts]
bar01.app.logicmonitor.com
bar02.app.logicmonitor.com
bar03.app.logicmonitor.com

Environment Setup

export LM_COMPANY="AnsibleTest"
export LM_USER="ansible"
export LM_PASSWORD="mypassword"


Use Case 1: Provisioning an application stack with monitoring

This use case demonstrates using Ansible to provision an application stack and adding the newly-provisioned hosts into LogicMonitor. This use case is also applicable to adding new hosts to an existing environment or application stack.

Cool takeaways and examples:

‌• Installing LogicMonitor collectors
‌• Adding hosts to LogicMonitor monitoring
‌• Bonus: Setting device groups and device properties when adding
‌• Creating LogicMonitor device groups
‌• Updating devices that are already being monitored by LogicMonitor

Playbook
# This playbook provides an example use case for provisioning an application
# stack and orchestrating these hosts for monitoring within LogicMonitor. This
# includes provisioning LogicMonitor collectors, application-specific device
# groups, and monitoring application servers.
#
# NOTE: We're relying on shell environment variables for passing LogicMonitor
# credentials into the logicmonitor module. There are a variety of ways to
# achieve this goal, but for the purposes of this playbook, we're exporting the
# variables: LM_COMPANY, LM_USER, and LM_PASSWORD.
#
# Further documentation can be found here:
#   https://docs.ansible.com/ansible/logicmonitor_module.html

---
# Do some boilerplate, non-LogicMonitor orchestration tasks here. This task or
# tasks will obviously be specific to your own environment
- name: Provision hosts
  hosts: linux_hosts
  become: yes
  tasks:
    - name: Install telnet just for fun
      package:
        name=telnet
        state=present

# Install LogicMonitor collectors on designated hosts here.
# We need a collector before adding our provisioned hosts to monitoring
- name: Provision LogicMonitor Collectors
  hosts: collector_hosts
  become: yes
  tasks:
    - name: Install LogicMonitor collectors
      logicmonitor:
        target=collector
        action=add
        company="{{ lookup('env', 'LM_COMPANY') }}"
        user="{{ lookup('env', 'LM_USER') }}"
        password="{{ lookup('env', 'LM_PASSWORD') }}"

# Add all hosts into basic monitoring here. This is the baseline monitoring
#  config for all devices. We'll do app-specific customizations later.
- name: Add hosts to LogicMonitor monitoring
  hosts: linux_hosts
  become: no
  tasks:
    - name: Add all hosts into monitoring
      become: no
    # All tasks except for target=collector should use local_action
      local_action: >
        logicmonitor target=host
        action=add
        collector="collector01.logicmonitor.com"
        company="{{ lookup('env', 'LM_COMPANY') }}"
        user="{{ lookup('env', 'LM_USER') }}"
        password="{{ lookup('env', 'LM_PASSWORD') }}"
        groups="/servers/production,/test-datacenter"
        properties="{'snmp.community':'commstring','dc':'test', 'type':'prod'}"

# Create LogicMonitor device groups for different applications
#
# Note that there's some intelligence here when assigning the LogicMonitor device
# display name. In the inventory, I've assigned a host level variable
# lm_display_name for some hosts but not others. For the displayname parameter
# below, we're using a Jinja2 filter to set the displayname parameter using either
# the host variable lm_display_name or, if that variable isn't set, to default to
# using device's hostname.
#
# Also note that, since there's only one device group in LogicMonitor per app type,
# we don't need to run this task for every host, so we've set run_once to true.
- name: Create LogicMonitor device groups for applications
  hosts: collector_hosts
  become: yes
  vars:
    app_names: ['foo', 'bar']
  tasks:
    - name: Create a host group
      become: no
      # All tasks except for target=collector should use local_action
      local_action: >
          logicmonitor target=hostgroup
          action=add
          displayname="{{ lm_display_name | default(inventory_hostname) }}"
          fullpath='/applications/{{ item }}'
          company="{{ lookup('env', 'LM_COMPANY') }}"
          user="{{ lookup('env', 'LM_USER') }}"
          password="{{ lookup('env', 'LM_PASSWORD') }}"
          properties="{'app.name':'{{ item }}'}"
      with_items: "{{ app_names }}"
      run_once: true

# Add 'foo' application servers to the 'foo' device group in LogicMonitor.
# This will be useful for more convenient management of these servers in the
#  portal and allow for device group properties configured about to be inherited.
- name: Add foo application hosts to foo device group
  hosts: application_foo_hosts
  become: no
  tasks:
    - name: Add foo application hosts to foo device group
      become: no
      # All tasks except for target=collector should use local_action
      local_action: >
        logicmonitor target=host
          action=update
          company="{{ lookup('env', 'LM_COMPANY') }}"
          user="{{ lookup('env', 'LM_USER') }}"
          password="{{ lookup('env', 'LM_PASSWORD') }}"
          collector="collector01.logicmonitor.com"
          groups="/applications/foo"

# Add 'bar' application servers to the 'bar' device group in LogicMonitor.
# This will be useful for more convenient management of these servers in the
#  portal and allow for device group properties configured about to be inherited.
#
# Note that we're also updating the collector field, thereby moving these hosts
# to a new collector. This isn't strictly necessary, but shows an example of
# this process, and for our hypothetical situation, allows us to isolate each
# application on its own collector.
- name: Add bar application hosts to bar device group
  hosts: application_bar_hosts
  become: no
  tasks:
    - name: Add bar application hosts to bar device group
      become: no
      # All tasks except for target=collector should use local_action
      local_action: >
        logicmonitor target=host
          action=update
          company="{{ lookup('env', 'LM_COMPANY') }}"
          user="{{ lookup('env', 'LM_USER') }}"
          password="{{ lookup('env', 'LM_PASSWORD') }}"
          collector="collector02.logicmonitor.com"
          groups="/applications/bar"

Running the playbook

ansible-playbook -i inventory use_case_1.yml
Results
$ ansible-playbook -i inventory use_case_1.yml

PLAY [Provision hosts] *********************************************************

TASK [setup] *******************************************************************
ok: [collector01.logicmonitor.com]
ok: [bar01.app]
ok: [bar02.app.logicmonitor.com]
ok: [collector02.logicmonitor.com]
ok: [foo01.app.logicmonitor.com]
ok: [foo02.app.logicmonitor.com]

TASK [Install telnet just for fun] *********************************************
changed: [collector01.logicmonitor.com]
changed: [bar01.app]
changed: [bar02.app.logicmonitor.com]
changed: [foo02.app.logicmonitor.com]
changed: [foo01.app.logicmonitor.com]
changed: [collector02.logicmonitor.com]

PLAY [Provision LogicMonitor Collectors] ***************************************

TASK [setup] *******************************************************************
ok: [collector01.logicmonitor.com]
ok: [collector02.logicmonitor.com]

TASK [Install LogicMonitor collectors] *****************************************
changed: [collector01.logicmonitor.com]
changed: [collector02.logicmonitor.com]

PLAY [Add hosts to LogicMonitor monitoring] ************************************

TASK [setup] *******************************************************************
ok: [foo01.app.logicmonitor.com]
ok: [collector01.logicmonitor.com]
ok: [bar02.app.logicmonitor.com]
ok: [collector02.logicmonitor.com]
ok: [bar01.app]
ok: [foo02.app.logicmonitor.com]

TASK [Add all hosts into monitoring] *******************************************
changed: [foo01.app.logicmonitor.com -> localhost]
changed: [foo02.app.logicmonitor.com -> localhost]
changed: [collector02.logicmonitor.com -> localhost]
changed: [collector01.logicmonitor.com -> localhost]
changed: [bar02.app.logicmonitor.com -> localhost]
changed: [bar01.app -> localhost]

PLAY [Create LogicMonitor device groups for applications] **********************

TASK [setup] *******************************************************************
ok: [collector01.logicmonitor.com]
ok: [collector02.logicmonitor.com]

TASK [Create a host group] *****************************************************
changed: [collector01.logicmonitor.com -> localhost] => (item=foo)
changed: [collector01.logicmonitor.com -> localhost] => (item=bar)

PLAY [Add foo application hosts to foo device group] ***************************

TASK [setup] *******************************************************************
ok: [foo01.app.logicmonitor.com]
ok: [foo02.app.logicmonitor.com]

TASK [Add foo application hosts to foo device group] ***************************
changed: [foo01.app.logicmonitor.com -> localhost]
changed: [foo02.app.logicmonitor.com -> localhost]

PLAY [Add bar application hosts to bar device group] ***************************

TASK [setup] *******************************************************************
ok: [bar01.app]
ok: [bar02.app.logicmonitor.com]

TASK [Add bar application hosts to bar device group] ***************************
changed: [bar02.app.logicmonitor.com -> localhost]
changed: [bar01.app -> localhost]

PLAY RECAP *********************************************************************
bar01.app.logicmonitor.com     : ok=6    changed=3    unreachable=0    failed=0
collector01.logicmonitor.com   : ok=8    changed=4    unreachable=0    failed=0
bar02.app.logicmonitor.com     : ok=6    changed=3    unreachable=0    failed=0
collector02.logicmonitor.com   : ok=8    changed=4    unreachable=0    failed=0
foo02.app.logicmonitor.com     : ok=6    changed=3    unreachable=0    failed=0
foo01.app.logicmonitor.com     : ok=6    changed=3    unreachable=0    failed=0


Use Case 2: Deploying applications

This use case demonstrates using the LogicMonitor module in your existing application deployment playbooks to schedule downtime (SDT) for the affected applications and device groups. We use this method at LogicMonitor to suppress spurious alerts when rolling out application updates.

Cool takeaways and examples:

‌• SDT LogicMonitor datasources
‌• SDT LogicMonitor device groups

Playbook
# This playbook provides an example use case for using the LogicMonitor module
# to schedule downtime (SDT) for monitored hosts during an application deploy in
# order to eliminate superfluous LogicMonitor alerts.
#
# There are two examples showing different options for SDTing an application.
# The first example will apply an SDT to an application-specific datasource ID.
# Currently this does require a bit of initial legwork to retrieve the ID from
# the LogicMonitor portal. A benefit of using this approach rather than SDTing
# the entire host is that you will still be alerted to non-deploy related alerts
# during the SDT duration.
#
# The second example demonstrates setting devices' SDT at the LogicMonitor
# device group level. This is more of a hypothetical example and not something
# we'd necessarily recommend as a best practice.
#
# For example, SDTing at the device level using the
# Ansible inventory allows for finer grained control of SDT during deploys,
# while SDTing at the device group level potentially provides broader SDT
# coverage, especially in situations where deploying a particular application
# may trigger alerts in other applications that aren't actually relevant to the
# Ansible inventory.
#
# For the purposes of this example, our application deploy process will simply
# consist of downloading a war file and then copying it to an application
# directory. This playbook can be obviously be rearranged to suit your needs,
# but we recommend sequencing your SDT task as close to the first production-
# impacting task as possible. For example, there's no need to SDT your hosts
# while waiting for a deploy artifact to download; this increases the changes of
# missing legitimate alerts that aren't related to the deployment.
#
# NOTE: We're relying on shell environment variables for passing LogicMonitor
# credentials into the logicmonitor module. There are a variety of ways to
# achieve this goal, but for the purposes of this playbook, we're exporting the
# variables: LM_COMPANY, LM_USER, and LM_PASSWORD.
#
# Further documentation can be found here:
#   https://docs.ansible.com/ansible/logicmonitor_module.html

---
# This playbook will demonstrate an application deployment utilizing a
# LogicMonitor SDT at the device level.
- name: Deploy app foo
  hosts: application_foo_hosts
  become: yes
  tasks:
    # - name: Download deploy artifact from release artifact server to temp location
    #   get_url:
    #     url: https://releases.logicmonitor.com/applications/foo/foo.war
    #     dest: /tmp/foo.war
    - name: Download deploy artifact from release artifact server to temp location
      command: touch /tmp/foo.war

    # Schedule downtime for the foo application datasource, lasting 5 minutes,
    # starting now.
    # We want to sequence this task as close to the first production-impacting
    # task as possible.
    - name: Schedule Downtime for application datasource
      become: no
      # All tasks except for target=collector should use local_action
      local_action: >
          logicmonitor target=datasource
          action=sdt
          id='123'
          duration=5
          company="{{ lookup('env', 'LM_COMPANY') }}"
          user="{{ lookup('env', 'LM_USER') }}"
          password="{{ lookup('env', 'LM_PASSWORD') }}"

    # For the sake of simplicity, we're assuming that foo is a Tomcat application
    # and that Tomcat is configured to automatically explode and start new wars
    - name: Deploy application by moving to webapps dir
      copy:
        remote_src=True
        src=/tmp/foo.war
        dest=/usr/local/bar/webapps/foo.war

    - name: Remove temp war
      file:
        path=/tmp/foo.war
        state=absent

    # We always like to use Ansible to verify that our application deploys were
    # successful. For the sake of this example, since every application is
    # different, we're going to cheat a bit and pretend that we already have a
    # functional verification script installed alongside the application. There
    # are a variety of different ways to implement this functionality natively
    # within Ansible, but that's a topic for a whole different blog.
    - name: Verify application was deployed successfully
      shell: "/usr/local/foo/bin/verify.sh status"
      register: result

    - debug: var=result.stdout_lines

# This playbook will demonstrate an application deployment utilizing a
# LogicMonitor SDT at the group level
- name: Deploy app bar
  hosts: application_bar_hosts
  become: yes
  tasks:
    # - name: Download deploy artifact from release artifact server to temp location
    #   get_url:
    #     url: https://releases.logicmonitor.com/applications/bar/bar.war
    #     dest: /tmp/bar.war
    - name: Download deploy artifact from release artifact server to temp location
      command: touch /tmp/bar.war

    # Schedule downtime for the bar device group, lasting 5 minutes, starting now.
    # We want to sequence this task as close to the first production-impacting
    # task as possible.
    #
    # Note that we're using the same device group that we created in the first
    # use case.
    - name: Schedule Downtime for application device group
      become: no
      # All tasks except for target=collector should use local_action
      local_action: >
          logicmonitor target=hostgroup
          action=sdt
          fullpath="/applications/bar"
          duration=5
          company="{{ lookup('env', 'LM_COMPANY') }}"
          user="{{ lookup('env', 'LM_USER') }}"
          password="{{ lookup('env', 'LM_PASSWORD') }}"

    # For the sake of simplicity, we're assuming that bar is a Tomcat application
    # and that Tomcat is configured to automatically explode and start new wars
    - name: Deploy application by moving to webapps dir
      copy:
        remote_src=True
        src=/tmp/bar.war
        dest=/usr/local/bar/webapps/bar.war

    - name: Remove temp war
      file:
        path=/tmp/bar.war
          state=absent

    # We always like to use Ansible to verify that our application deploys were
    # successful. For the sake of this example, since every application is
    # different, we're going to cheat a bit and pretend that we already has a
    # functional verification script installed alongside the application. There
    # are a variety of different ways to implement this functionality natively
    # within Ansible, but that's a topic for a whole different blog.
    - name: Verify application was deployed successfully
      shell: "/usr/local/bar/bin/verify.sh status"
      register: result

    - debug: var=result.stdout_lines
Running the playbook
ansible-playbook -i inventory use_case_2.yml
Results
$ ansible-playbook -i inventory use_case_2.yml

PLAY [Deploy app foo] **********************************************************

TASK [setup] *******************************************************************
ok: [foo01.app.logicmonitor.com.app]
ok: [foo02.app.logicmonitor.com.app]

TASK [Download deploy artifact from release artifact server to temp location] **
changed: [foo01.app.logicmonitor.com.app]
changed: [foo02.app.logicmonitor.com.app]

TASK [Schedule Downtime for devices] *******************************************
changed: [foo02.app.logicmonitor.com.app -> localhost]
changed: [foo01.app.logicmonitor.com.app -> localhost]

TASK [Deploy application by moving to webapps dir] *****************************
changed: [foo01.app.logicmonitor.com.app]
changed: [foo02.app.logicmonitor.com.app]

TASK [Remove temp war] *********************************************************
changed: [foo02.app.logicmonitor.com.app]
changed: [foo01.app.logicmonitor.com.app]

TASK [Verify application was deployed successfully] ****************************
changed: [foo01.app.logicmonitor.com.app]
changed: [foo02.app.logicmonitor.com.app]

TASK [debug] *******************************************************************
ok: [foo01.app.logicmonitor.com.app] => {
    "result.stdout_lines": [
        "Success! App is serving."
    ]
}
ok: [foo02.app.logicmonitor.com.app] => {
    "result.stdout_lines": [
        "Success! App is serving."
    ]
}
PLAY [Deploy app bar] **********************************************************

TASK [setup] *******************************************************************
ok: [bar01.app.logicmonitor.com]
ok: [bar02.app.logicmonitor.com]

TASK [Download deploy artifact from release artifact server to temp location] **
changed: [bar02.app.logicmonitor.com]
changed: [bar01.app.logicmonitor.com]

TASK [Schedule Downtime for application device group] **************************
changed: [bar01.app.logicmonitor.com -> localhost]
changed: [bar02.app.logicmonitor.com -> localhost]

TASK [Deploy application by moving to webapps dir] *****************************
changed: [bar02.app.logicmonitor.com]
changed: [bar01.app.logicmonitor.com]

TASK [Remove temp war] *********************************************************
changed: [bar01.app.logicmonitor.com]
changed: [bar02.app.logicmonitor.com]

TASK [Verify application was deployed successfully] ****************************
changed: [bar02.app.logicmonitor.com]
changed: [bar01.app.logicmonitor.com]

TASK [debug] *******************************************************************
ok: [bar01.app.logicmonitor.com] => {
    "result.stdout_lines": [
        "Success! App is serving."
    ]
}
ok: [bar02.app.logicmonitor.com] => {
    "result.stdout_lines": [
        "Success! App is serving."
    ]
}

PLAY RECAP *********************************************************************
bar01.app.logicmonitor.com      : ok=7    changed=5    unreachable=0    failed=0
foo02.app.logicmonitor.com.app  : ok=7    changed=5    unreachable=0    failed=0
foo01.app.logicmonitor.com.app  : ok=7    changed=5    unreachable=0    failed=0
bar02.app.logicmonitor.com      : ok=7    changed=5    unreachable=0    failed=0

Use Case 3: Software updates

This use case demonstrates scheduling downtime (SDT) for LogicMonitor as part of your system software update workflow. This use case is useful for most situations where you’re likely to induce LogicMonitor alerts as part of regular maintenance.

Cool takeaways and examples:

‌• SDT LogicMonitor devices
‌• Use Jinja2 filters, Ansible variables, and the displayname parameter to avoid specifying a collector when performing ‘host’ actions
‌• Rebooting hosts as part of an Ansible play without interrupting Ansible execution

Playbook
# This playbook provides an example use case for using the LogicMonitor module
# to schedule downtime (SDT) for monitored hosts and collectors during system
# updates and perform a reboot.
#
# NOTE: We're relying on shell environment variables for passing LogicMonitor
# credentials into the logicmonitor module. There are a variety of ways to
# achieve this goal, but for the purposes of this playbook, we're exporting the
# variables: LM_COMPANY, LM_USER, and LM_PASSWORD.
#
# Further documentation can be found here:
#   https://docs.ansible.com/ansible/logicmonitor_module.html

---
- name: Perform system updates on application hosts and reboot
  hosts: application_hosts
  become: yes
  tasks:
    # For the sake of example, we're just going to update the telnet package we
    # installed in the first use case.
    - name: Update all packages on the system
      package:
        name=telnet
        state=latest

    # Schedule downtime for each host, lasting 15 minutes, starting now
    #
    # Note that we could also SDT at the device group level if we wanted to, but
    # for this use case, it's going to be easier and more reliable to apply this
    # at the device level. This ensures that all of the hosts being updated get
    # SDT and also ensures we don't unnecessarily SDT other hosts or mistakenly
    # miss SDTing an affected host.
    #
    # Also note that, since we're SDTing a wide range of hosts in our example
    # infrastructure, it becomes cumbersome to specify devices' collectors.
    # Instead, we'll specify the displayname, which allows the module to
    # dynamically lookup the correct collector for each host. As in the first
    # use case example, we're using a Jinja2 filter to add some intelligence to
    # the displayname parameter allowing us to use either the inventory variable
    # lm_display_name or default to the device's hostname.
    - name: Schedule Downtime for devices
      # All tasks except for target=collector should use local_action
      become: no
      local_action: >
          logicmonitor
          target=host
          action=sdt
          duration=15
          displayname="{{ lm_display_name | default(inventory_hostname) }}"
          company="{{ lookup('env', 'LM_COMPANY') }}"
          user="{{ lookup('env', 'LM_USER') }}"
          password="{{ lookup('env', 'LM_PASSWORD') }}"

    # Here we're going to use a trick to reboot the systems without interrupting
    # our Ansible execution. To accomplish this, we'll asynchronously reboot the
    # hosts and then wait for them to become accessible again. We don't gain a
    # whole lot using this method in this particular example, but it's extremely
    # useful when performing additional tasks after the reboot.
    # Source: https://support.ansible.com/hc/en-us/articles/201958037-Reboot-a-server-and-wait-for-it-to-come-back
    - name: Reboot
      shell: sleep 2 && /sbin/shutdown -r now
      ignore_errors: true

    - name: Wait for server to reboot
      become: no
      local_action: >
        wait_for host=host={{ inventory_hostname }}
        port=22
        state=started
        delay=30
        timeout=300

# This playbook is very similar to the one above but handles the scenario of
# updating and rebooting hosts that have collectors on them. In order to prevent
# 'Collector Down' alerts, we'll also need to SDT the collectors.
#
# Note that we've already completed updated our other hosts before touching the
# collector hosts. This ensures that any issues encountered during the previous
# playbook are adequately detected by LogicMonitor monitoring before we begin
# touching our collectors.
- name: Perform system updates on collector hosts and reboot
  hosts: collector_hosts
  become: yes
  tasks:
    # For the sake of example, we're just going to naively update all of installed
    # packages. That's totally safe right and couldn't possibly have adverse effects,
    # could it? ;)
    - name: Update all packages on the system
      package:
        name=*
        state=latest

    # Schedule downtime for each host, lasting 15 minutes, starting now
    #
    # Note that we could also SDT at the device group level if we wanted to, but
    # for this use case, it's going to be easier and more reliable to apply this
    # at the device level. This ensures that all of the hosts being updated get
    # SDT and also ensures we don't unnecessarily SDT other hosts or mistakenly
    # miss SDTing an affected host.
    #
    # Also note that, since we're SDTing a wide range of hosts in our example
    # infrastructure, it becomes cumbersome to specify devices' collectors.
    # Instead, we'll specify the displayname, which allows the module to
    # dynamically lookup the correct collector for each host. As in the first
    # use case example, we're using a Jinja2 filter to add some intelligence to
    # the displayname parameter allowing us to use either the inventory variable
    # lm_display_name or default to the device's hostname.
    - name: Schedule Downtime for devices
      # All tasks except for target=collector should use local_action
      become: no
      local_action: >
          logicmonitor target=host
          action=sdt
          duration=15
          displayname="{{ lm_display_name | default(inventory_hostname) }}"
          company="{{ lookup('env', 'LM_COMPANY') }}"
          user="{{ lookup('env', 'LM_USER') }}"
          password="{{ lookup('env', 'LM_PASSWORD') }}"

    # In order to prevent spurious 'Collector Down' alerts, we're also going to
    # SDT the collectors on these hosts.
    - name: Schedule Downtime for collectors
      logicmonitor:
        target=collector
        action=sdt
        duration=15
        company="{{ lookup('env', 'LM_COMPANY') }}"
        user="{{ lookup('env', 'LM_USER') }}"
        password="{{ lookup('env', 'LM_PASSWORD') }}"

    # Same as above, we're now going to reboot the hosts and wait for them to
    # come back up.
    - name: Reboot
      shell: sleep 2 && /sbin/shutdown -r now
      async: 1
      poll: 0
      ignore_errors: true

    - name: Wait for server to reboot
      become: no
      local_action: >
        wait_for host={{ inventory_hostname }}
        state=started
        delay=30
        timeout=300
Running the playbook
ansible-playbook -i inventory use_case_3.yml

Results
$ ansible-playbook -i inventory use_case_3.yml

PLAY [Perform system updates on application hosts and reboot] ******************

TASK [setup] *******************************************************************
ok: [bar01.app.logicmonitor.com]
ok: [foo01.app.logicmonitor.com]
ok: [foo02.app.logicmonitor.com]
ok: [bar02.app.logicmonitor.com]

TASK [Update all packages on the system] ***************************************
changed: [foo01.app.logicmonitor.com]
changed: [bar02.app.logicmonitor.com]
changed: [bar01.app.logicmonitor.com]
changed: [foo02.app.logicmonitor.com]

TASK [Schedule Downtime for devices] *******************************************
changed: [bar01.app.logicmonitor.com -> localhost]
changed: [foo01.app.logicmonitor.com -> localhost]
changed: [foo02.app.logicmonitor.com -> localhost]
changed: [bar02.app.logicmonitor.com -> localhost]

TASK [Reboot] ******************************************************************
fatal: [foo01.app.logicmonitor.com]: FAILED! => {"changed": false, "failed": true, "module_stderr": "", "module_stdout": "", "msg": "MODULE FAILURE"}
...ignoring
fatal: [foo02.app.logicmonitor.com]: FAILED! => {"changed": false, "failed": true, "module_stderr": "", "module_stdout": "", "msg": "MODULE FAILURE"}
...ignoring
fatal: [bar01.app.logicmonitor.com]: FAILED! => {"changed": false, "failed": true, "module_stderr": "", "module_stdout": "", "msg": "MODULE FAILURE"}
...ignoring
fatal: [bar02.app.logicmonitor.com]: FAILED! => {"changed": false, "failed": true, "module_stderr": "", "module_stdout": "", "msg": "MODULE FAILURE"}
...ignoring

TASK [Wait for server to reboot] ***********************************************
ok: [bar01.app.logicmonitor.com]
ok: [bar02.app.logicmonitor.com]
ok: [foo02.app.logicmonitor.com]
ok: [foo01.app.logicmonitor.com]

PLAY RECAP *********************************************************************
bar01.app.logicmonitor.com      : ok=5    changed=2    unreachable=0    failed=0
foo01.app.logicmonitor.com      : ok=5    changed=2    unreachable=0    failed=0
foo02.app.logicmonitor.com      : ok=5    changed=2    unreachable=0    failed=0
bar02.app.logicmonitor.com      : ok=5    changed=2    unreachable=0    failed=0

Use Case 4: Decommission devices

This use case provides an example of decommissioning application hosts and removing those devices and corresponding device groups from LogicMonitor.

Cool takeaways and examples:
‌• SDT LogicMonitor device groups
‌• Remove devices from LogicMonitor
‌• Remove device groups from LogicMonitor
‌• Shutting down hosts as part of an Ansible play without interrupting Ansible execution

Playbook

# This playbook provides an example use case for decommissioning hosts and
# using the LogicMonitor Ansible module to remove them from monitoring. We'll
# pretend that we need no longer need the bar application in our infrastructure
# and decommission all of those servers.
#
# NOTE: We're relying on shell environment variables for passing LogicMonitor
# credentials into the logicmonitor module. There are a variety of ways to
# achieve this goal, but for the purposes of this playbook, we're exporting the
# variables: LM_COMPANY, LM_USER, and LM_PASSWORD.
#
# Further documentation can be found here:
#   https://docs.ansible.com/ansible/logicmonitor_module.html

---
- name: Decommission all bar application hosts and remove from monitoring
  hosts: application_hosts
  become: yes
  tasks:
      # Schedule downtime for the bar device group, lasting 60 minutes,
      # starting now. Since we're decommissioning hosts, the exact length of the
      # SDT isn't critical, so an hour gives us a pretty big buffer with no
      # adverse consequences
    - name: Schedule Downtime for application device group
      become: no
      # All tasks except for target=collector should use local_action
      local_action: >
          logicmonitor target=hostgroup
          action=sdt
          fullpath="/applications/bar"
          duration=60
          company="{{ lookup('env', 'LM_COMPANY') }}"
          user="{{ lookup('env', 'LM_USER') }}"
          password="{{ lookup('env', 'LM_PASSWORD') }}"

    # Do some boilerplate, non-LogicMonitor orchestration tasks here. This task or
    # tasks will obviously be specific to your own environment. For this example,
    # we'll pretend that a simple halt is sufficient.
    #
    # Note that there's nothing strictly wrong with skipping the SDT and just
    # removing the devices from LogicMonitor before decommissioning. In this
    # example, by leaving the hosts in LogicMonitor until after decommissioning,
    # we can potentially detect hosts that are stranded in an unstable state.
    #
    # As in the previous use case, we're using a bit of Ansible magic to allow us
    # to shut down the host without stopping Ansible execution.
    - name: Decommission host
      shell: sleep 2 && /sbin/shutdown -h now
      async: 1
      poll: 0
      ignore_errors: true

    # Note that, since we're SDTing a wide range of hosts in our example
    # infrastructure, it becomes cumbersome to specify devices' collectors.
    # Instead, we'll specify the displayname, which allows the module to
    # dynamically lookup the correct collector for each host. As in the first
    # use case example, we're using a Jinja2 filter to add some intelligence to
    # the displayname parameter allowing us to use either the inventory variable
    # lm_display_name or default to the device's hostname.
    #
    # Also note that, since there's only one device group in LogicMonitor, we
    # don't need to run this task for every host, so we've set run_once to true.
    - name: Remove devices from LogicMonitor
      become: no
      # All tasks except for target=collector should use local_action
      local_action: >
          logicmonitor target=host
          action=remove
          displayname="{{ lm_display_name | default(inventory_hostname) }}"
          company="{{ lookup('env', 'LM_COMPANY') }}"
          user="{{ lookup('env', 'LM_USER') }}"
          password="{{ lookup('env', 'LM_PASSWORD') }}"

    - name: Remove bar application device group from LogicMonitor
      become: no
      # All tasks except for target=collector should use local_action
      local_action: >
          logicmonitor target=hostgroup
          action=remove
          fullpath="/applications/bar"
          company="{{ lookup('env', 'LM_COMPANY') }}"
          user="{{ lookup('env', 'LM_USER') }}"
          password="{{ lookup('env', 'LM_PASSWORD') }}"
      run_once: true
Running the playbook
ansible-playbook -i inventory use_case_4.yml

Results
$ ansible-playbook -i inventory use_case_4.yml

PLAY [Decommission all bar application hosts and remove from monitoring] *******

TASK [setup] *******************************************************************
ok: [bar01.app.logicmonitor.com]
ok: [bar02.app.logicmonitor.com]

TASK [Schedule Downtime for application device group] **************************
changed: [bar02.app.logicmonitor.com -> localhost]
changed: [bar01.app.logicmonitor.com -> localhost]

TASK [Decommission host] *******************************************************
ok: [bar01.app.logicmonitor.com]
ok: [bar02.app.logicmonitor.com]

TASK [Remove devices from LogicMonitor] ****************************************
changed: [bar01.app.logicmonitor.com -> localhost]
changed: [bar02.app.logicmonitor.com -> localhost]

TASK [Remove bar application device group from LogicMonitor] *******************
changed: [bar02.app.logicmonitor.com -> localhost]
changed: [bar01.app.logicmonitor.com -> localhost]

PLAY RECAP *********************************************************************
bar01.app.logicmonitor.com      : ok=5    changed=3    unreachable=0    failed=0
bar02.app.logicmonitor.com      : ok=5    changed=3    unreachable=0    failed=0

Conclusion

Scaling IT automation and managing complex environments means it’s critical that your monitoring is aligned with your infrastructure and operations. With LogicMonitor and Ansible modules, users can unify the source of truth and operations – including comprehensive monitoring – into one consistent, repeatable process, using familiar Ansible Playbooks.

Jeff Wozniak

Jeff Wozniak is an employee at LogicMonitor.

Subscribe to our LogicBlog to stay updated on the latest developments from LogicMonitor and get notified about blog posts from our world-class team of IT experts and engineers, as well as our leadership team with in-depth knowledge and decades of collective experience in delivering a product IT professionals love.