How LogicMonitor realizes operational efficiency and reduces risk by using its own products
LogicMonitor is a leading provider of SaaS monitoring and observability of hybrid cloud infrastructure. With 100,000 users across 30 countries and the LM Envision platform monitoring 880 billion metrics per day across 3.4 million active devices (with roughly a third being in the Cloud), LogicMonitor is critical to ensuring that our customers’ infrastructure and applications are ‘always on’, performing well, and, when they aren’t, enable IT teams to find the root cause of a problem as quickly as possible. Quick Root Cause Analysis (RCA) thus enables faster Mean Time To Restore (MTTR) and the preservation of SLAs and uptime of what is really important – the end-user services on which customers and employees rely.
LogicMonitor, being a B2B enterprise, shares similar demands with our customers; our business and our reputation rely on the reliability and uptime of our monitoring and observability services. If we were to not provide the contractual uptime, we could lose revenue as well.
As an Inc 5000 company, the reliability of our internal network and infrastructure is necessary to ensure normal business operations in our fast-paced environment. Whether in Sales, Marketing, Technical Support, Engineering or in any other division of the company, a globally-distributed workforce needs reliable connectivity to ensure continuity of day-to-day operations in a hybrid working model. With double-digit company growth, ensuring the scalability of our IT infrastructure to keep up with additional headcount and locations is imperative to sustain that growth.
The ability to drive growth and transformation, efficiency, increase productivity, and reduce overall risk are all top of mind for leaders in today’s enterprises – and LogicMonitor is no different. Just as our customers are making their journey to the cloud, LogicMonitor has done the same, evolving our infrastructure from on-premises, to AWS, to containerized services running in AWS, while providing multi-tenancy for our customers to manage costs. A key aspect of operational efficiency is achieved by using our platform to keep track of license and AWS Cloud usage, helping to control our cloud costs.
Walking a mile in our customer’s shoes
LogicMonitor is used across our applications and infrastructure landscape for monitoring network traffic, equipment health, CPU/memory usage, user workload, and logs – whether in the cloud or on-premises – to ensure all business applications are available to our internal users. We monitor approximately 3,000 devices plus 15,000 cloud resources across our data center, AWS environment, and Austin, Santa Barbara, and Pune offices.
In addition to these, we leverage our own products to monitor services, including Jira, Confluence and our corporate website, which triggers alerts to the appropriate team if they become unavailable or when the heap is full to prevent slower page loads. We also leverage LogicMonitor to increase proactivity and improve workflows, specifically around SSL certificates and VPN users. With certificates installed across our infrastructure, our teams can easily leverage LogicMonitor’s platform to proactively renew certificates before they reach expiration.
LogicMonitor also tracks our user counts on VPN for alerting purposes to mitigate reaching maximum thresholds. As a data-driven company, we also monitor Okta in order to gather data, so that should we need to move services in the future, we’re able to make a data-driven decision with LogicMonitor.
Experiencing the benefits of our hard work
Risk mitigation is a key benefit of using LogicMonitor’s Envision platform. We saw the benefits of this when experiencing a situation where our firewalls in our data center exceeded memory thresholds. The Technical Operations team was proactively notified automatically, allowing us to do a rapid failover before the tunnels crashed. Downtime was thus avoided and no customer
was affected.
Our Root Cause Analysis capability ties together alerts with logs in order to reduce MTTR, while reducing false positives and providing anomaly detection via AIOps.
“Metrics can tell you when something occurred but you frequently need logs to inform why it happened. The LogicMonitor team has been using centralized log aggregation tools to assist with diagnostics for years,” said Randall Thomson, VP of Technical Operations at LogicMonitor. “First and foremost, it’s nice not to have to switch products for different diagnostic use cases. Second, anomaly detection provides real value both for surfacing one-off events and for needle- in-the-haystack type research.”
During our migration to the AWS Cloud, we used the LM Envision platform to verify our architectural assumptions and optimize our configuration. We were thus able to reduce our overall infrastructure costs in a structured and manageable way by keeping our metrics and monitoring from the same perspective, without affecting the latency, response time, or performance of the Envision platform for our customers.
By keeping a close eye on our infrastructure in a unified view, we are able to ensure customer satisfaction (CSAT), both internally and externally with our partners and customers. This high CSAT has enabled us to substantially grow our revenue as a company, resulting in LogicMonitor being recognized by the Financial Times as one of The Americas’ Fastest Growing Companies 2023. With over 2000 customers relying on us directly and over 10,000 customers monitored by our partners as a part of their managed services business,
it is safe to say that LogicMonitor’s capabilities are critical to those business and IT operations, as well as our own.