To be a good software leader, you have to give your teams as much autonomy as possible. However, you also have to be ultimately responsible (especially when things go wrong). One of the hardest things about being the manager is owning responsibility for everything but having no direct control.
The way great managers solve this is by setting up processes, tools, or mechanisms that provide insights. These allow them to ask the right questions (at the right time), and gently steer the team in the right direction.
Software engineering managers (or any senior technical leaders) have many responsibilities: the care and feeding of the team, delivering on business outcomes, and keeping the product/system/application up and running and in good order. Each of these areas can benefit from a systematic approach. The one I present here is setting up checks and balances for the team's operational excellence.
Operational excellence is the ability to consistently deliver high-quality products and services to customers. It is essential for software engineering managers because it helps them ensure that their teams are able to meet the needs of their customers.
There are many benefits to operational excellence, including:
If you are taking on a new team or want to improve the way your current team works, this is a checklist and some best practices I have used in organizations I have led. Keep in mind this isn't meant to be comprehensive, and you should plan to adjust the list based on your team, your goals, and your timelines.
Most incidents occur from bad code pushes or other changes to the environment. As a leader, you should make sure you have visibility into launches and that the team doing the launch has done their homework. For example, consider these items:
Problems and incidents will happen, but you don't want to be bitten by the same problem twice. Make sure you understand what the incident load looks like in your team, and how they are doing against closing open items from those incidents.
Another important part of your leadership role is architecting a sensible on-call rotation for your team(s). Grouping like services and expanding rotations can be helpful if it always takes the same two or three teams to solve an incident (for tightly coupled services). Some organizations also set up a front-end or mobile on-call rotation to respond quickly to urgent bugs or issues in clients. As a leader, you should think about the following aspects:
As a leader of the team, do you know how your software is performing? This is more than just uptime—you should be paying attention to all of your key user flows through the system, looking at throughput, latency, etc.
Besides the ability to handle incidents and outages, another important part of operational excellence is understanding the customer experience. In addition to system metrics, you should be paying attention to all the information you get from customers. For example:
This topic could be an article all to itself. Having good testing, automation, robust CI/CD (continuous integration/continuous delivery) pipelines, etc., helps prevent problems. Ask yourself (or your engineers): How do you know the code you are pushing is high-quality? What would you need to do to answer that question with a yes?
As you go through each of these points, you can find many ways to make adjustments and improve in each area. The first step is asking the right questions. The second step may include things such as:
Operational excellence is a critical part of the success of any software engineering team. As a leader, you have a huge opportunity to improve the way your team is working. Best of luck and wishing you 100 percent uptime.
Kate Matsudaira is VP of technology for SoFi's Money (checking and savings), credit card, Invest, insurance, At Work, and partnerships. Previously, she was a VP at Splunk, where she was responsible for the Observability product suite. She has also worked as an executive at Google and helped build several successful startups that were acquired by eBay, O'Reilly Media, and Limelight. She started her career as a software engineer and lead at Microsoft and Amazon. She is a keynote speaker and published author, and has been honored with recognitions such as the NCWIT Symons Innovator Award. She lives in Issaquah, WA (outside of Seattle), with her husband, Garrett; three boys; and three dogs.
Copyright © 2023 held by owner/author. Publication rights licensed to ACM.
Originally published in Queue vol. 21, no. 5—
Comment on this article in the ACM Digital Library
João Varajão, António Trigo - Assessing IT Project Success: Perception vs. Reality
This study has significant implications for practice, research, and education by providing new insights into IT project success. It expands the body of knowledge on project management by reporting project success (and not exclusively project management success), grounded in several objective criteria such as deliverables usage by the client in the post-project stage, hiring of project-related support/maintenance services by the client, contracting of new projects by the client, and vendor recommendation by the client to potential clients. Researchers can find a set of criteria they can use when studying and reporting the success of IT projects, thus expanding the current perspective on evaluation and contributing to more accurate conclusions.
Abi Noda, Margaret-Anne Storey, Nicole Forsgren, Michaela Greiler - DevEx: What Actually Drives Productivity
Developer experience focuses on the lived experience of developers and the points of friction they encounter in their everyday work. In addition to improving productivity, DevEx drives business performance through increased efficiency, product quality, and employee retention. This paper provides a practical framework for understanding DevEx, and presents a measurement framework that combines feedback from developers with data about the engineering systems they interact with. These two frameworks provide leaders with clear, actionable insights into what to measure and where to focus in order to improve developer productivity.
Jenna Butler, Catherine Yeh - Walk a Mile in Their Shoes
Covid has changed how people work in many ways, but many of the outcomes have been paradoxical in nature. What works for one person may not work for the next (or even the same person the next day), and we have yet to figure out how to predict exactly what will work for everyone. As you saw in the composite personas described here, some people struggle with isolation and loneliness, have a hard time connecting socially with their teams, or find the time pressures of hybrid work with remote teams to be overwhelming. Others relish this newfound way of working, enjoying more time with family, greater flexibility to exercise during the day, a better work/life balance, and a stronger desire to contribute to the world.
Bridget Kromhout - Containers Will Not Fix Your Broken Culture (and Other Hard Truths)
We focus so often on technical anti-patterns, neglecting similar problems inside our social structures. Spoiler alert: the solutions to many difficulties that seem technical can be found by examining our interactions with others. Let’s talk about five things you’ll want to know when working with those pesky creatures known as humans.