Get to know LM Envision

Speakers

Bryan Whitmarsh, LogicMonitor

Bio: Bryan Whitmarsh has been in the software industry for over 30 years, starting out as an engineer, moving into engineering management, technical sales, then leading into product management starting in 2002. The observability space has been his primary focus starting with mobile device management back in the PalmPilot and Symbian days, to Wily Technology powered APM, and most recently Logic Monitor Infrastructure Management.

Bryan is a Sr. Product Manager at LogicMonitor. Bryan is primarily focused on Cloud Infrastructure Management. He loves living in rural Idaho, the outdoors and flying airplanes.

Sarah Terry, LogicMonitor

Sarah Terry is the Vice President of Product Management at LogicMonitor, where she leads the strategy and development for the core IT infrastructure monitoring supporting the LM Envision platform. With over a decade at LogicMonitor, Sarah has played a pivotal role across many facets of the product organization, including launching and scaling LM Cloud, LogicMonitor’s cloud monitoring solution, helping to expand the company’s capabilities in hybrid and multi-cloud environments. Sarah holds a Bachelor’s degree in Engineering Physics from Cornell University. Based in Santa Barbara, she enjoys running, cycling, and swimming along the California coast. Outside of work, she loves spending time with her husband, their one-year-old son, and their dog.

Devon Biancarelli-Milano, Yale University

As a Network Engineer at Yale University, I manage and maintain network systems to ensure consistent and reliable operation to meet business needs. I provide escalated user support, troubleshoot and resolve network issues, and install, configure, and upgrade network hardware and software.

Presentation Slide Deck Video Transcript

Video Transcript

“Alright. Hi, everyone.

Welcome to the get to know LM InVision session. My name is Sarah Terry. I am the VP of product management here at LogicMonitor.

And today, we are joined by a customer, Devin, from Yale, who’s gonna tell us all about how they use logic monitor and how it’s helped them on their journey. So, Devin, do you wanna introduce yourself briefly?

Sure thing, Sarah. My name is Devin Biancarelli. I am a network engineer here at Yale. I’ve been here a little over three years, and we’ve, for the past two years, have been using logic monitor to monitor our core infrastructure here at Yale ITS.

Awesome. And can you tell us a little bit more about the infrastructure at Yale, how it’s laid out, what types of devices that you have?

Sure thing. So we rely off multiple, pieces of software such as Cisco DNA Center, Cisco ICE, Blue Cat, we rely off for IPAM, and we use logic monitor to kinda centralize our monitoring so that we have a central point for all departments to kinda look at, what’s going on in our, infrastructure, in our network. And we’ve relied off logic monitor heavily using rest AP and SNMP MIBs and out of the box data sources along with dashboards to help, kinda communicate and be transparent with other, departments and how our network is doing.

Amazing. Super helpful. Maybe you can start by telling us a little bit about what it was like before you used logic monitor and some of the pain points that you guys felt.

So before we use logic monitor, we had a very out of date network monitoring system that would often only do ping only for the most part. Switching over to logic monitor has been a big, increase in troubleshooting efficiency.

We’re able to dive deeper into our network infrastructure and how it’s performing.

We rely heavily off of dashboards and role based access for users so that they can only see what they’re allowed to see and have a better, transparent view of what’s going on in each department of the network.

Awesome. Okay. And what types of data are you relying on in logic monitor? Are you using metrics, logs, config data, topology? Can you kind of walk me through that?

We’re mostly relying off of metrics and also a little bit of config data. We rely heavily off of SNMP MIBs such as, CPU network interfaces that we rely off heavily.

We use REST API for our Cisco products such as DNA Center and Cisco ICE. We also dive a little bit into ServiceNow and Cisco thousand ICE for a more in-depth view of what those things are doing.

We also heavily use config sources to monitor our config. We have it set to every twenty four hours.

Logic monitor goes and gathers, the config, the starting config, the running config, sometimes the inventory, and we’re able to compare every single day with whether or not a config configuration has changed or not.

Awesome. That sounds like a a lot of data to help speed troubleshooting.

Can you talk me through a little bit about how you guys use logic monitor on a day to day basis? Are you, you know, typically troubleshooting? Are you typically using it more for a proactive view?

It’s mostly a proactive view. We have multiple departments that have branched out and asked about logic monitor since we, purchased it.

Each department has its own dashboard view such as our data center data center operations team has just a simple NUC dashboard to kinda keep an eye on how the network is going at present time, while network engineering kind of has a more in-depth view of the core infrastructure here at Yale and how it’s in real time, acting along with other teams such as the cloud team, which relies off of logic monitor to gather info on its AWS and Azure environment here at Yale.

Makes sense. So a true hybrid environment. It sounds like you guys have a lot of on prem equipment as well as equipment in the cloud that you’re monitoring all from logic monitor, which is a classic use case.

Can you walk me through a little bit about what else you guys have in your environment? I know you mentioned you use ServiceNow. What kinds of integrations and and features of logic monitor are you guys using?

Yes. So we have a integration with ServiceNow where we rely off the rest API to gather metrics on both change tickets, request tickets, incident tickets. We’re kinda able to track how many tickets are created over the day and what group they go to.

For example, we have a large number of incident tickets that go to our help desk, and we use the REST API along with logic monitor to kinda see the increase over time that the help desk is getting hit with incident tickets. So day to day, we see a regular pattern, and we can also rely off dynamic alerting as well to alert us on any spikes that we see in incidents.

If we ever see throughout the day a large spike in incident tickets hitting the help desk, we’re able to kind of take a look at it and see, you know, maybe there might be something wrong with the network. We dive into ServiceNow. We dive into our other, software solutions, and we see if something is getting bottlenecked in the network and to see what the cause of all those incident tickets are.

Awesome. Super helpful.

And can you just describe again the benefit that you guys are getting from logic monitor now that, you know, you have it in place for your your hybrid environment at Yale?

So logic monitor has been a huge benefit in terms of being able to dive deeper into what’s going on here at the, in Yale’s ITS network environment.

Like I said before, we had a very out of date, NMS product that would only, be used to kind of ping our devices. Being able to use logic monitor has been able to allow us to dive deeper into using SNMP MIBs that allowed for a broader range of metrics and being able to rely off the custom data sources and custom groovy scripts to allow us to utilize REST API of other solutions and even being able to, SSH into our devices to gather data that way, rather than using SNMP as our REST API.

Awesome. Well, it sounds like you have a much more comprehensive view with logic monitor over the whole environment in a a single place, which is is gonna make troubleshooting a lot faster. So that’s that’s great to hear.

Alright. Well, Devin, let’s go ahead and pull up the slides, and we’ll take a look at the logic monitor platform and which parts of it you’re using, in the context of the the element vision market texture we have.

Alright. So here we’re looking at the element vision platform. I’m gonna go through this in a little bit more detail in a few minutes here. But first, I wanna start off and let’s highlight what aspects of the platform Devon is using with Yale. So let’s start with the left hand side of this diagram.

And Devon, again, maybe you can highlight for us, you know, on this modern data center, chart here, you know, what areas you feel are are relevant to Yale. And then on the telemetry side on the right, again, what types of data you guys are relying on from logic monitor?

Sure thing. So on the modern data center side of things, we rely heavily off the on prem infrastructure side to gather metrics.

A lot of our core infrastructure is on-site and our physical devices, so we rely heavily off of both SNMP MIBs to gather metrics, for that data.

We also rely heavily off of the cloud infrastructure or at least the cloud department does.

Like I said, they have both their AWS and Azure environment inside of logic monitor.

And, they also monitor, I believe, the transit gateways as well as long, also along with their, virtual Palo Alto as well.

As for the telemetry side of things, we are using, the events portion to monitor, any events that pop up in our Cisco DNA Center solution.

We also rely heavily off of topology as well. We do have a few in environments where we have kind of a shortened view of the topology of our network. And like I said before, we also rely heavily off of the config portion as well, constantly every twenty four hours gathering the configuration of our core infrastructure allows us to keep an eye on things and to see if any changes were made that helps us, further with troubleshooting.

Awesome. Alright. And that’s in addition to the the metrics and the config that you mentioned previously. So you’re hitting almost everything on this telemetry spectrum, which is which is great. Alright. Well, Devin, thank you so much for your time. Really appreciate you joining us today and giving us this insight into how you use logic monitor at Yale and how it has benefited your team in giving you more comprehensive insight into your hybrid environment and helping you troubleshoot faster.

Alright. So let’s move on to go through this diagram in more detail. I wanna really describe the the left, the middle features, and then the right solutions as well. So, again, starting on the left hand side, this is really what we expect the modern enterprise to have across their complex hybrid environment. And, again, it could be in Devon’s case, right, as simple as just having on prem and cloud, but we also have customers that are monitoring applications in logic monitor. They’ve got IoT to consider, and they’re starting to adopt AI and wanna monitor their LLMs as well. So we really consider the modern data center to be the full spectrum of these things that we wanna provide visibility into in the ElementVision platform.

When it comes to the type of telemetry that we’re collecting across this modern data center environment, again, we’re looking at metrics, logs, events. We are pulling in traces for applications, and then we do have automated topology mapping where we are mapping the infrastructure and applications we’re monitoring as well. In addition to config data, as Devin mentioned, where we are periodically pulling the configs from network devices as well as cloud driven cloud config driven infrastructure as well.

And as we move to the middle here, we do have the element vision platform and the various features that make it up that help customers troubleshoot faster, get proactive insight into their environment, and ultimately prevent downtime.

So we have, you know, service insights, which enables this service level view. We’ve got resource explorer, customizable dashboards, and, reports, which are gonna help make meaningful views for customers of the monitored data that we have. And then we’ve got a whole suite of AI and intelligence features around dynamic thresholding, which helps use anomaly detection for intelligent alerting, forecasting, and prediction to help with, you know, predicting size of infrastructure and capacity needs, as well as our Edwin AI root cause analysis and event intelligence, which can help really reduce the alert noise that’s happening in the platform and streamline the incident response.

And then on the right hand side here, we have the solutions that we are delivering with these features, everything from our AI agent on the Edwin side to the complete infrastructure observability we’re providing for hybrid enterprise environments, to cloud monitoring, cloud cost optimization, which I’ll cover, as well as things like sustainability monitoring and, of course, application visibility.

Now to show this in a slightly different context, again, this diagram is very similar. Where on the left hand side, we have the customer’s environment, whether that’s on prem or in the public cloud. In the middle, we’ve got the logic monitor platform where the customer you’re able to get dashboards, reports, and really make sense of this rich monitor data that we’re automatically pulling from the environment, the automated topology views that we’re constructing.

And then on the right hand side, really gonna be the integrations that we have to bid it fit in better to into the existing IT ecosystem. So we wanna make sure that, you know, alerts are routed to the right people at the right time in the right place using the tool that that your teams are used to using. And that if you’re using anything for automation, that we’re able to tie into that as well.

And then lastly, just wanna give a brief overview of Edwin AI. Edwin AI sits on top of logic monitor and really takes the intelligence that we have in the platform to the next level, to, again, reduce that alert noise. So, really, on the left hand side, we’re starting with your observability stack, whether that’s only logic monitor or whether it’s, you know, logic monitor paired with something on the security side, maybe something on the APM side, and really pulling the events from your full observability suite of products into the Edwin AI platform.

From there, along the bottom, we’re actually using a consolidated knowledge graph to make sure we’re enriching that data from sources like the ServiceNow CMDB, where we can actually use that data to enhance the correlation that we are providing across the events that we’ve ingested across the observability stack.

And then on the right hand side, we wanna make sure that we’re outputting these correlated insights to the right platform, whether again, that ServiceNow or otherwise.

Alright. So with that, let’s jump into a demo of the logic monitor platform.

So here we have a dashboard that a hybrid enterprise might use as they’re trying to scale and grow, and they wanna stay focused on a few specific departments like the sales department, the finance department, and the operations department. You can see up top that we’ve got comprehensive views that give us the status of these departments and the infrastructure that each is relying on. As we continue to scroll down, you can see that we’ve got the automated topology views that logic monitor has constructed, Those are mapped out here as well as the physical locations of these devices on the right hand side.

And then as we continue to move through this dashboard, we have insight into the end user experience. So how long is our payroll application taking for the end users that are accessing it from around the world? So we actually have our synthetic transactions and our website monitoring powering these graphs here. And then as we continue to move down, we have service based views of each department. So as we look at our sales department, we know that access to the CRM is very important. And so we’re monitoring that with some of the statistics and health data we have here. As we continue on to finance, you know, accessing the payroll, is is very important as well as the network shares, and so we’ve got monitoring for our our Azure based environment there, all pulled into this consolidated dashboard.

And then on the operation side, we are actually migrating customers from one VPN to another. And we wanna make sure that as that happens, we’re keeping an eye on the firewall sessions. So we’ll actually use logic monitors anomaly detection to see if there’s anything unusual. You can see that we do have a few red spikes that have been highlighted by logic monitor as unusual activity that we may wanna look into to ensure that this VPN migration is moving successfully.

And then as we continue to scroll down, we can see that we have more insights into the Azure portion of the environment supporting this organization from the performance data to the health data as well as the service limits. So, you know, what are those default limits that Azure has imposed per service and per region that we might need to be aware of? And we’ve got alerts as well. Before we dive into these alerts, I actually wanna start with a slightly different view and show this enterprise environment using the resource explorer.

So using the resource explorer, we’re gonna rely on the rich metadata that logic monitor ingests for the environment. And in this case, we’re gonna use the tags because we have our infrastructure tagged by department. We can see a great breakdown of our finance operations and sales departments here. We can see that we do have the alert status overlaid, and we’ll go ahead and take a look at one of these issues that the finance team is having, with the end of your reporting for their Tomcat server. So you can see we’ve got the metadata, we’ve got the alert details. Let’s go ahead and click into the full alert details here and take a look at how logic monitor can help us troubleshoot.

So we can see that our Tomcat server has gone from good to bad. It was returning, you know, two hundred, and now it’s returning, what looks like more like a four hundred error code. Logic monitor has highlighted this in red, flagging that it is an anomaly. It’s something that we should look into and raised an alert based on dynamic thresholds. And as we scroll down, we can see the additional metrics like response time, which don’t seem too unusual, but we also have the logs and, specifically, the anomalies in the logs highlighted right here at our fingertips. And we can see that there were a few anomalies flagged right when the alert occurred.

So let’s go ahead and dig into those anomalous logs. Here we can see a particular path doesn’t exist. It looks like web was spelled with two b’s.

That looks like a typo and is likely something that we’re gonna wanna look into. So that’s potentially the cause of why this Tomcat server actually went down and and stopped responding and, therefore, the solution to fixing our end of year reporting.

Now this is a fairly simple scenario where we were able to just use metrics and logs and the answer was right in front of us because logic monitor surfaced it in the right way. But what happens during an alert storm when you have tons of alerts and you don’t even know which alert to start doing this on? That’s really where Edwin AI comes in. So with Edwin AI, we’re ingesting events and alerts across your entire observability stack, and we’re correlating them together. You can see in this account, we’ve actually processed over three thousand alerts, and we’ve correlated those into just forty one insights that were sent to ServiceNow.

You can see we don’t have any singleton or uncorrelated alerts, but if there were any, the system is constantly learning over time, and that number should remain fairly low.

So let’s go ahead and look at one of these insights and the number of alerts, that we have. We’ll take this look at this one with thirteen alerts here.

And the first thing that we can see is, we do have a timeline view, as well as the details here. And these details include tags that are auto generated by Edwin AI.

We have the impacted CIs from the ServiceNow CMDB. And again, we are using that CMDB to pull in metadata, and we’re using that with the correlation.

The timeline view is gonna show us how, you know, the the specific Meraki issue cascaded through the environment to actually impact our shipping and payment service.

And then down below, you can actually see that Edwin AI is correlated together alerts from logic monitor, from Splunk, and from Dynatrace in this case. So we’ve got alerts from three different observability tools that are being pulled together.

The AI analysis is gonna give us a really nice human readable view of this that tells us that the Meraki security appliance tunnels issue disrupted the payment and shipping services, causing database sync timeouts and more. And then it gives us some some root cause information that can help us go troubleshoot and and figure out, what we might need to fix to resolve the issue.

Now let’s go ahead and take a look at one of these alerts, the originating alert, ideally, which was in logic monitor. We’ll click the source ID link right here. That’ll take us back into logic monitor to this alert detail page we just looked through with our Tomcat server, where we can see, that again, we have the metrics plotted here, nothing too interesting. But we also have the log anomalies pulled in right at our fingertips, and we can see that there was a configuration template change. And that perhaps is what happened with the Meraki appliance in the first place that caused the issue that cascaded to disrupt our shipping and payment services. So again, another example of how Edwin AI can help you narrow down to the right alert to actually use the rich data that logic monitor has to troubleshoot faster.

And then I just wanna show one more aspect of the platform, which is going to be our new cost optimization product that we launched last year. There are really two aspects to this product. The first one is billing, where we will show multi cloud billing in a a single dashboard here. You can actually see this broken down by a variety of different dimensions, everything from the resource type, to the region, to the detailed breakdown below in that table.

And you can slice and dice and drill in and out of these. We can also break it down based on the metadata that we have for the environment. So let’s say we only wanna look at a few specific business units. We can narrow down the view to focus on the IT ops and product development teams and look specifically at their spend.

The weekly trend is gonna tell us how the spend is changing, right? Are we increasing? Are we decreasing? Where are we gonna end up at the end of the month? And do we need to make any changes to stay within our budget?

And then we also have views like provider that are gonna tell us how we’re spending across the multiple cloud providers we’re using, which are AWS and Azure in this case.

Additional tabs to the right here and look at the account and region views that, again, are gonna break it down by various dimensions like the different regions we have cloud infrastructure running in, the resource types that we’re using across the cloud providers. So we can get a really granular view and slice and dice how we’re spending.

The recommendations tab is gonna give us explicit recommendations on how we can actually optimize our cloud spend. So you can see here, we’re pulling the performance data that logic monitor has alongside the spend data to identify cost savings opportunities.

It looks like we have a few idle e c two instances that we could clean up, as well as maybe e c two instances we could resize.

We have information here including a direct link to the AWS console to perform the action, as well as the data that helps explain why we’re providing this action and various life cycle actions that we can perform to ensure that this list stays relevant over time.

So that’s an example of how logic monitor can go beyond just performance and availability monitoring, and it help provide value through optimizing, business metrics like cloud spend.

Hopefully, this platform overview was useful, particularly when paired with the information from Devin earlier on how Yale has been able to benefit from the logic monitor platform. And, hopefully, everyone feels like they know a lot more about element vision at this point.”