The Soft Side of Software

  Download PDF version of this article PDF

Working Models for Tackling Tech Debt

Understand the options to tailor an approach that suits your needs

Kate Matsudaira

Tech debt is inevitable in any software project. If you lead a large software team, you need to figure out a way to relentlessly chip away at the cruft that accumulates on projects. No engineer wants to work in a messy, hard-to- understand codebase because it becomes harder to make changes, diagnose problems, and perform other essential functions. It's like living in a messy house: You might do it if you have no other choice, but no one really enjoys it.

As a leader, you must find ways to manage tech debt. There's no way around it. Luckily, there are models for tackling tech-debt reduction that you can leverage to improve your codebase.

Your goal is to deliver high-quality software as fast as you can. Accelerating delivery acts like compound interest for any project, since the faster you get features out, the sooner you can get feedback to make improvements, test for product-market fit, and deliver value to your customers. This need for speed, however, can also introduce some significant tradeoffs, since moving fast:

• Doesn't leave a lot of time to address tech debt or improve the codebase.

• May encourage shortcuts. Instead of taking the time to write clean, well-designed code, developers might choose to implement faster but suboptimal solutions just to meet deadlines. Like taking out a loan, these shortcuts come with an interest penalty. In the case of tech debt, that interest takes the form of all the extra time and effort required down the road to fix problems caused by the shortcuts that are taken now.

• Includes too many workarounds. Unforeseen challenges sometimes arise, and workarounds are implemented to get past them. The workarounds may yield short-term fixes, but they aren't ideal for the long-term health of the codebase. For example, messy code can be slow to run, difficult to understand, and prone to bugs. Beyond the most immediate difficulties, this can make it much harder and more time-consuming to add new features later or fix existing ones.

Another consideration is that a software system needs to constantly evolve by way of software upgrades, attention to security vulnerabilities, and the addition of new functionality. The added time and effort that comes with maintaining and using outdated technologies should also be considered part of your tech debt, since newer technologies typically offer better performance, security, and maintainability.

All of these requirements need to be balanced with finding ways to make progress on improving your products while also maintaining your current systems. The key is to be vigilant about identifying tech debt and have a plan for dealing with it. This may involve setting aside time specifically to refactor code or upgrade technologies.

A Short Guide to Fixing Big Things

When faced with complex problems, I always come back to the following mental model, since it follows the basics of any problem:

1. Determine your baseline. Before you can fix any problem, you need visibility. What is the current state of tech debt in your systems? Do you know what needs to be fixed? Have you assessed the effort, impact, and priority of those fixes?

2. Using your list, figure out what work streams you need. Can this be done in the gaps between other projects, or do you need to carve out dedicated time? Are the problems surfacing mostly in one area, or are they widespread?

3. Have a plan for moving these work streams along. Who is going to lead them? How will you track progress?

4. Clear the blockers. Communicate upward and outward so people are aware of the project, the progress you're making, and the anticipated impact. Address impediments as they arise.

Evaluate how things are going and continuously calibrate the priority and progress of your project.

Choosing the right staffing model for each work stream is also crucial. There are several factors to consider here, including the project's size, complexity, required expertise, and competing priorities.

 

Setting up the Work streams

Many of the following mechanisms can be combined, and you can focus on the elements in each that best fit your work requirements. The examples here include process as well as tech debt-related tasks to illustrate both the work stream and model.

 

Iterate outward

Start with a small scope. Get things working well. Then expand outward.

This model works well whenever changes promise to add incremental value to each area (as opposed to requiring everyone to adopt and onboard to some particular system in order to reap the potential value/benefit). This is especially useful for experimental ideas or riskier initiatives—or in those instances with a lot of inertia/resistance. With an iterative approach, you can show success and results with each iteration, which helps to build buy-in and reduce risk. This approach also works well when it comes to changes in process, since it allows you to focus your initial steps on the willing early adopters and tackle the more resistant parts of your organization later.

Here are some examples that fit well with this model:

Upgrading a database. You can upgrade one instance at a time, starting with smaller, lower-risk ones to ensure a smooth transition.

Adopting a new framework or pattern for a codebase. If you want to change the way that error handling is done, you could start by implementing the change in one repo (or system), then expanding to adjacent components/systems once you've realized success. With each successive iteration, you'll build out better templates, examples, scripts, and tests—all of which should make adoption easier with each expansion in scope.

Setting up a new on-call or incident-management process. Focus initially on just one team before expanding to the parent group of that team, and then continue up the line. With each step, you'll learn about what works and what doesn't. What's more, your documentation, training, and best practices will continue to improve as you expand the scope.

 

Identify and track

When you have "must-do" work or top-down initiatives to satisfy, one tried-and-true model is to identify all the work, set up a milestone/task/ticket to track progress against each requirement, and then methodically chase down each item in turn. This model can be quite effective whenever it's easy to identify all the work that needs to be done. It can prove challenging, however, to drive execution across a large scope.

If this seems familiar, that's because it's the very approach teams often use to track and resolve bugs. Each is opened as a ticket, and then all the tickets are prioritized team by team within their sprints/working model. It's all about divide and conquer—with each task being assigned to the most relevant team or person and then methodically tracking the progress being made to completion.

Here are some examples of where this model may prove useful:

API migration. You can programmatically identify all the consumers of the old API and then create tickets for them to migrate to the new API.

Security vulnerabilities. All the vulnerabilities that result from a scan can be opened as tickets and then tracked in the specified timeframe (or service-level agreement) according to the assigned risk/severity level.

Roadmap slides. If you have a large organization, you'll want each team or area to update their roadmap for quarterly reviews. A good way to frame this is to provide a placeholder slide for each team and then have them follow a template to update their content by a specified date.

 

Set a cutover date

For some projects—particularly those where you don't get the benefit without a critical mass (of the sort that a social network, chat client, videoconferencing software, or code repository might deliver)—simply instituting a cutover date can prove to be the most effective way to go. This tends to work especially well for projects where there is a clear end or start date—for example, the end of a vendor contract, at which point everything will need to be in place for a clean cutover.

The model works just as it sounds: You set a date, and everybody must comply. The challenges often have to do with communication (typically in terms of ensuring that all the impacted parties have received—and understood—the message) and adoption friction (in the sense that humans tend to resist change, meaning there always seem to be edge cases).

This model can also include a grace period or some amount of overlap time between the old approach and the new one. For example, it could be that people are encouraged to use the latest version of the software but aren't actually required to (yet). This can help to mitigate risk and remove some of the friction that comes with any change.

Examples of this type of project include:

IT application changes. Say your company moves from Zoom to Google Meet, or you move from using chat to Slack.

A new build system. Generally, you want only one build system pushing to production so you can run a uniform fleet—meaning this would need to be accomplished as a clean cutover, at least for certain applications.

 

All hands on deck

At times, there are really urgent problems—a pressing security issue, for example—that call for an entirely different model where you run the project as you might handle an incident response. That is, whenever a project calls for "all hands on deck," an entirely different set of guidelines applies. Here's how to proceed:

Establish a leader. This person acts as an incident manager, overseeing all aspects of the project.

Triage work. The project leader should assess and prioritize tasks to ensure that the most critical issues are the first ones addressed.

Set up parallel work streams. Break the project down into smaller, more manageable tasks that can be addressed concurrently by different teams or individuals.

Have regular status updates. Schedule check-ins, war rooms, or other forms of communication that keep everyone promptly informed of progress and alert them to any obstacles that may arise.

This model works well when something is truly urgent and all other work needs to pause to let this goal be accomplished as soon as possible. That said, it's also an approach that should be used sparingly since it is quite disruptive.

Still, when it comes to tech debt or other challenges such as bug fixing, setting aside a week of focus (for example, a "Fix-it Week") can prove to be an effective way to parallelize a lot of work across your team for a fixed period of time. This approach tends to work best whenever the challenge is smaller in scope (can be completed in less than one week) and lends itself to easy parallelization across the entire team.

 

Carve out

For certain types of projects, the best approach may be to dedicate a specific number of people to the task. Those people can be freed from the distractions of normal business to focus on the matter at hand. Generally, this is an approach used for skunkworks projects or small initiatives that don't require a lot of coordination or expertise. If your tech debt resides largely in one system and a small set of experts is required to address that, this model can be used to let them focus on just this problem alone for a period of time.

The downside is that work on carve-out projects can linger because it isn't carried along by the momentum of the larger team. If you choose this model, make sure you manage estimates, deadlines, and progress closely. Also, be sure to scope and define the work so the engineers doing the work gain recognition for their impact (since everyone loves to be able to take credit for new features when making a case for promotion).

 

Big picture

As a leader, you should always understand the state of your systems, and have a plan for how to manage that.

Remember that not all debt is bad, and sometimes, in fact, strategic tech debt can even be used as a valuable tool to achieve certain business goals—just as financial debt can be taken on to obtain capital that can be invested in other profitable ventures. For example, taking a shortcut to get a product to market quickly could prove to be a wise decision if it allows the company to learn from customer feedback and then iterate accordingly on the product. But like barnacles on a ship, too much tech debt can slow you down, so be vigilant about managing it.

Effective management of tech debt is crucial for the long-term success of any software project. That's because, by understanding all the options afforded by these different working models, you can tailor an approach that best suits the requirements of your team and project.

 

Kate Matsudaira is VP of technology for SoFi's Money (checking and savings), credit card, Invest, insurance, At Work, and partnerships. Previously, she was a VP at Splunk, where she was responsible for the Observability product suite. She has also worked as an executive at Google and helped build several successful startups that were acquired by eBay, O'Reilly Media, and Limelight. She started her career as a software engineer and lead at Microsoft and Amazon. She is a keynote speaker and published author, and has been honored with recognitions such as the NCWIT Symons Innovator Award. She lives in Issaquah, Washington (outside of Seattle), with her husband, Garrett; three boys; and three dogs.

Copyright © 2024 held by owner/author. Publication rights licensed to ACM.

acmqueue

Originally published in Queue vol. 22, no. 3
Comment on this article in the ACM Digital Library





More related articles:

Catherine Hayes, David Malone - Questioning the Criteria for Evaluating Non-cryptographic Hash Functions
Although cryptographic and non-cryptographic hash functions are everywhere, there seems to be a gap in how they are designed. Lots of criteria exist for cryptographic hashes motivated by various security requirements, but on the non-cryptographic side there is a certain amount of folklore that, despite the long history of hash functions, has not been fully explored. While targeting a uniform distribution makes a lot of sense for real-world datasets, it can be a challenge when confronted by a dataset with particular patterns.


Nicole Forsgren, Eirini Kalliamvakou, Abi Noda, Michaela Greiler, Brian Houck, Margaret-Anne Storey - DevEx in Action
DevEx (developer experience) is garnering increased attention at many software organizations as leaders seek to optimize software delivery amid the backdrop of fiscal tightening and transformational technologies such as AI. Intuitively, there is acceptance among technical leaders that good developer experience enables more effective software delivery and developer happiness. Yet, at many organizations, proposed initiatives and investments to improve DevEx struggle to get buy-in as business stakeholders question the value proposition of improvements.


João Varajão, António Trigo, Miguel Almeida - Low-code Development Productivity
This article aims to provide new insights on the subject by presenting the results of laboratory experiments carried out with code-based, low-code, and extreme low-code technologies to study differences in productivity. Low-code technologies have clearly shown higher levels of productivity, providing strong arguments for low-code to dominate the software development mainstream in the short/medium term. The article reports the procedure and protocols, results, limitations, and opportunities for future research.


Ivar Jacobson, Alistair Cockburn - Use Cases are Essential
While the software industry is a fast-paced and exciting world in which new tools, technologies, and techniques are constantly being developed to serve business and society, it is also forgetful. In its haste for fast-forward motion, it is subject to the whims of fashion and can forget or ignore proven solutions to some of the eternal problems that it faces. Use cases, first introduced in 1986 and popularized later, are one of those proven solutions.





© ACM, Inc. All Rights Reserved.