Setting Up DevOps Teams for Success

Setting Up DevOps Teams for Success

Summary:

The gap between top and low-performing engineering teams is dramatic, whatever angle you look at it from. Whether you analyze their tech stacks and architecture choices, performance metrics, or cultural and team elements, the delta tends to be quite impressive. Despite the wide range of approaches, a few indicators paint a clear picture of the typical characteristics of a well-oiled setup.

We’re getting to the root cause when we zoom into the cultural aspects. How do teams manage DevOps? How is ownership distributed and thought about? This is where the data is most definitive and where teams that want to drive change can actually start off with. 100 of 100 developers in high-performing teams report they are running a true “you build it, you run it” setup. A setup so streamlined and optimized that it doesn’t get in their way through overly complex design or restrictive abstractions. In comparison, only 3.1% of low-performing teams report this. In 52% of cases, low performers are still stuck in a model of “throw over the fence” operations. In 44.9% of cases, developers are left alone and develop a “shadow operations” model, where senior developers are taking over the role of operations to help less experienced developers, creating all sorts of inefficiencies.

Tech Stack & Architecture:

Application architecture

  • Top performers are running all loosely coupled architectures in 95.% of their apps. 
  • Low performers run almost twice as many applications that are monoliths.

Infrastructure

  • Top performers chose public cloud as the dominant approach
  • Low performers show more of on-prem 

Containerization

  • Across all teams, 82% are either already fully on containers, currently migrating, or currently starting a migration.
  • 18% are not planning to ever start a migration. More than twice as many low performers report they are never going to migrate – aging ecosystems and server less are the remaining options.
  • Kubernetes is the new legacy and solution of choice for more than 62% of all teams.

Configuration as Code

The 40% of low performers that do not use configuration as code will see skyrocketing amounts of work trying to roll back, follow good governance, meet audit requirements, etc. 

Infrastructure as Code – article

Performance Metrics

Deployment Frequency: if your environment management isn’t dynamic, you’re on a monolith, or your team composition and process flow is buggy, deploying and shipping software fast gets very hard.

  • High performers enable their developers to deploy to production in more than 50% of cases. Over 80% of top performers deploy at least several times per day. 
  • 22% of low performers say they deploy only “a few times per year” 

Lead Time: The time it takes to implement, test, and deliver code.

  • For more than 20% of low performers, it takes longer than one month to deliver their code through all stages
  • It take minutes for over 50% of top performers and there are almost none that take more than a week. This means individual code changes applied with any given deployment are likely to be significantly smaller, making it easier to review for colleagues and in turn lowers Change Failure Rate.

Mean Time to Recovery (MTTR): how long it takes you to get everything full operational again after a product or system failure

  • For the low performers not running on IaC or Config as Code, it takes them much longer to to fix things if they go sideways

Change Failure Rate: signals how much “faulty” code makes its way to production with any given deployment

Team Setup and Culture

  • only 21.2% of teams report that they can do all DevOps tasks on their own.
  • 44.6% of cases, they are supposed to, but the setup is so complicated that in reality only experienced developers can work on DevOps tasks and become a bottleneck for the team. 
  • 34.2% of cases, the reality is a “throw over the fence” split like 20 years ago
  • 96.6% of top performers report to heavily invest to improve their developer experience and consider it a top priority. Investing in self-service capabilities

Aaron Erickson, who built Salesforce’s Internal Developer Platform:  “Service ownership is a good idea in theory, but in practice people get confused. If developers have to run all the ops for their services, you do not have any economies of scale. To run 1,000 different services around Kubernetes, you shouldn’t need 1,000 Kubernetes experts to do that.”

Optimizing for Cognitive Load

  • Cognitive load: refers to the used amount of working memory resources (for developers)
  • while they’re becoming IaC wizards, they fall back on their area of specialization
  • Overcommunication is the only solution to finding the balance between giving dev teams self-service capabilities or abstraction from their dev-based roles

Golden Paths over Golden Cages

Golden path – about abstracting without abstracting

The most commonly used description for Golden Path-style self-service setups are Internal Developer Platforms. An Internal Developer Platform, or IDP, is a self-service layer that allows developers to interact independently with their organization’s delivery setup, enabling them to self-serve environments, deployments, databases, logs, and anything else they need to run their applications.

You have to treat developers as users, you have to iterate with them, you have to explain why the golden path makes sense and why they should use it for the good of everybody. (“win the hearts and minds of developers”)

Winning DevOps with a Self-Service setup build by platform teams

The mission: to build the tools that enable developers to ship scalable applications with high speed, quality, and performance

the Platform team is not to be seen as some sort of extension of the SRE or Ops teams, but rather as its own product team, serving customers (app developers) within your organization. Becoming a Platform Engineer

Internal balance

Successful Internal Platform teams manage to put in place strong guardrails and standards for their development teams. Without taking away too much of their autonomy.

Key areas top-performing Internal Platform teams focus on:

  • Treat your platform as a product: they need to be driven by a product mindset. Need to focus on what provides real value for its internal customer – the app developers – based on the feedback
  • Optimize iteration speed: You developers will be able to consistently ship more features and products to your customers while being confident that things won’t break
  • Solve common problems: start by understanding developer pain points and friction areas that cause slowdowns in development.
  • Be glue, my friend: Platform teams need to define a golden path for their developers: a reduced set of sane, proven choices of tools that get the job done and allow you to build, deploy, and operate your services. The main value you create as an Internal Platform team is to be the sticky glue that brings all the tools together and ensures a smooth development and deployment experience for your engineers.
  • Educate and empower your teams: Foster regular architectural design reviews. Share knowledge, experiences, and collectively define best practices. Ensure engineers have the right tools in place to validate and check for common pitfalls. Organize a hackathon.

Get the full report