CASE STUDY

SPS Commerce Accelerates Development Agility with Kubernetes Monitoring

Retail solutions leader frees teams to innovate faster with advanced monitoring

Company

SPS Commerce

Employees

1,231

Industry

Retail Technology

TAGS

Resource, Kubernetes, Container Applications, Innovation, Agility, Success Story

Challenge

SPS Commerce is a leader in providing cloud-based supply chain management software to retailers, suppliers, third-party logistics providers, and partners. Since 2001, this industry innovator has helped thousands of companies successfully replace time-intensive, manual processes with automation.

With the world’s largest retail network, SPS connects over 75,000 organizations in the retail industry together, handling more than $2 billion in gross merchandise value each day. At the heart of its operations are 800 servers in production across multiple data centers. SPS must manage, monitor, and troubleshoot hundreds of applications on those servers, and required an agile, scalable monitoring solution.

“We needed a solution that we could deploy very quickly, as we needed to add instrumentation to hundreds of hosts in a very short timeframe,” says Jamie Thingelstad, CTO, at SPS Commerce. “We also wanted a solution that could tie into other services.”

Like many organizations, SPS is also adopting containers and Kubernetes container orchestration in production to drive more cost savings and accelerate development. To unlock the full potential of containers, they needed a solution that could also offer monitoring for Kubernetes clusters and the applications running within them.

“We have been using containers for a few years, and there is a big push to use them more,” says Mike Woycheck, Reliability Engineer at SPS Commerce. “We went that route because our product needs to be dynamic and scalable. Given our rapid growth, we need to move fast and grow fast. Containers offer a faster way to achieve that.”

quotation marks

LogicMonitor’s Kubernetes Monitoring allowed us to find metrics that we didn’t know would be important— but could be. I sent the dashboard data back to the team, and they said, ‘this is what we should be basing our graphing and scaling out on.’

Mike Woycheck

Reliability Engineer, SPS Commerce

Solution

After evaluating and analyzing a variety of monitoring solutions, SPS Commerce deployed LogicMonitor. It provides automated hybrid infrastructure monitoring and analytics, enabling SPS to monitor, alert and report on the health of its entire IT environment.

“The general out-of-the-box aspects that come with LogicMonitor are fantastic,” says Andy Domeier, Director, Technology Operations at SPS Commerce. “Everyone’s looking at the same statistics and everyone starts to collaborate and help each other identify potential outliers or areas where performance looks like it could be taking a hit. You’re not spending a lot of time identifying where the issue is. You spend time fixing it instead.”

SPS shares the information from LogicMonitor across the company, giving more than 1,000 employees full visibility into the real-time performance of its cloud services.

To get visibility into Kubernetes clusters and containerized applications, SPS utilizes LogicMonitor Kubernetes Monitoring. LogicMonitor provides a lightweight app that runs as a pod in the Kubernetes cluster. It listens to the Kubernetes event stream and uses a LogicMonitor API to add cluster resources (nodes, pods, containers, and services) into LogicMonitor to be monitored. SPS set up a proof of concept to compare LogicMonitor Kubernetes monitoring to a leading open-source monitoring tool, and the LogicMonitor solution was easier to use.

“We implemented Kubernetes monitoring using the LogicMonitor platform and it was immediately clear that it had several built-in conveniences,” says Woycheck. “Spinning it up, the Kubernetes monitoring app provided a great deal of information at once and it was organized in a way that was easily readable. The other monitoring tool required us to figure out our own queries and set up our own dashboards, creating a lot of extra leg work. LogicMonitor’s platform has those capabilities built in.”

Unlike the other monitoring solution, LogicMonitor’s Kubernetes Monitoring app predefined the baseline health of Kubernetes pods, containers, and the different services its development team was running. Woycheck and his team found its automated dashboards especially helpful in supporting cluster capacity metrics.

“LogicMonitor’s Kubernetes Monitoring allowed us to find metrics that we didn’t know would be important—but could be,” says Woycheck. “I sent the dashboard data back to the team, and they said, ‘this is what we should be basing our graphing and scaling out on.’”

quotation marks

LogicMonitor has changed the behavior of our operations teams, really buying back time for the engineers, improving customer satisfaction because of system health and availability, and making the systems more transparent to the entire organization, which have been a really big win.

Andy Domeier

Director, Technology Operations, SPS Commerce

Benefits

With LogicMonitor’s Kubernetes Monitoring in place, SPS is taking steps to innovate by experimenting with a variety of different Kubernetes services. A second development team at SPS has recently set up an Azure Kubernetes Service (AKS) cluster and will employ the LogicMonitor Kubernetes monitoring app to support the environment.

“We are trying to lift and move many of our container services into Kubernetes,” says Woycheck. “We want to develop a solid understanding of Kubernetes and how to build it out safely. The idea is to get as much into it as possible, since it is the platform of choice, and in this complex world it helps organize what we do.”

“Having out-of-the-box data sources that just work with Kubernetes will be a big win for us as we start moving applications over,” he says.

With improved visibility into its own operations, SPS has a foundation to deliver consistently superior services to customers well into the future.

“LogicMonitor has changed the behavior of our operations teams, really buying back time for the engineers, improving customer satisfaction because of system health and availability, and making the systems more transparent to the entire organization, which have been a really big win,” says Domeier.

Ready to become a success story?

Get started