Understanding the context of an IT incident can greatly reduce the MTTR and enhance the ability to determine the root cause. In an IT environment, ‘context’ is used to refer to the subset of information necessary to troubleshoot and diagnose an incident, or event. For some scenarios, the context may be the downstream dependencies after a high availability pair of firewalls goes offline, and in others, it may be the datastore in contention from multiple VMs. Nevertheless, the context is a set of critical information necessary to understand the relevant scope when an incident occurs.
One of the core functionalities needed to achieve an actionable context is dynamic topology mapping of critical resources. Having an accurate and dynamic topology map enables ITOps to diagnose and troubleshoot critical issues at a glance. When an issue occurs, spending time in the CLI and leveraging TraceRoute only slows the troubleshooting process and may lead to end-user dissatisfaction.
When establishing context with a topology map, there are two fundamental approaches, additive and subtractive. In a topology map founded on the subtractive principle, users must navigate through their entire stack of resources until they find what they are looking for. In a subtractive model, identifying the problematic resource on the topology map may be as tedious as troubleshooting the incident itself. In a subtractive approach, users will often be daunted from the sheer volume of information provided to them and be confused about where to focus troubleshooting efforts. This holds even more true in large IT environments, where the IT infrastructure may span thousands of resources. While a subtractive approach to topology does have its benefits, such as a holistic overview of the network, it often inhibits the troubleshooting process and generates excessive noise in the user interface.
In an additive approach, users are provided the smallest subset of information needed to understand and troubleshoot an IT incident or event. When an event occurs, often times ITOps will be notified via an alert from their monitoring platform. From the alert, understanding the immediate context and only seeing relevant information will streamline troubleshooting efforts by reducing the pain of sorting through endless devices and resources. Although an additive approach intentionally provides only a limited set of information, the user can choose to expand further to understand a larger context. However, often times the incident can be understood and efficiently remediated from just knowing which resources are the first and second-degree neighbors. For example, if the hosts providing a mail service are experiencing poor performance, understanding immediately that it was due to the core switch experiencing severe network latency enables ITOps to act quickly and reduce end-user dissatisfaction.
LogicMonitor is proud to announce the addition of dynamic and additive topology mapping. Users can now view their infrastructure through an additional lens and troubleshoot incidents and events with increased agility. From nearly anywhere in the platform, users can create a topology map containing the necessary context to troubleshoot any event or incident that may have occurred. The topology map provides a real-time view of your resources and spans throughout your services, public cloud, and data center infrastructure. LogicMonitor leverages a variety of methods to create a dynamic map of your infrastructure, such as CDP, LLDP, forwarding tables, and various APIs. To learn more, request a free trial here, or discuss with your account manager.