Chapter 4. Automated Code Remediation in Action

Now that we’ve covered the background for automated code remediation, we want to share some real-world case studies exploring the practice. You’ll see what leads organizations to automate code remediation and its impact on the way they work.

Case Study: Improved Developer Productivity

Our first case study takes us back to the problem of technical debt. A midsize insurance company, with over 20 million lines of code across 1,200 repositories, was struggling to close code maintenance stories without impacting the development team’s productivity. Their code maintenance work, which covered vulnerability patching, code migration, and dependency upgrades, was folded under technical debt in their systems.

The organization was averaging one large maintenance project per quarter, and each of those projects could consume the entire development team. For example, in Q2 the team had to prioritize a Spring Boot version upgrade to secure their code from the Spring4Shell vulnerability. This became 32 different stories for the development team—one per repository. It also required one developer per repository acting as the “migration expert” for the update. It was all-consuming and reduced their business output, as you can see in the Q2 results shown in Figure 4-1.

Then they started using automated code remediation with Moderne. Instead of a whole development team touching multiple repositories to maintain code, a single developer in less than a day could have the Moderne platform update code and generate PRs on their behalf across repositories. Then the code would follow the normal workflow to test and deploy. Much of the overhead and work of code migration was handled by the platform.

Chart showing productivity increase of 1100% on maintenance related tasks and 30% on business value delivery.
Figure 4-1. Insurance company boosts developer productivity with auto-remediation

The team gained comprehensive, accurate visibility into their codebase—a better understanding of code and transitive code health—that is actionable through automation, enabling them to make significant, constant progress in code maintenance. This is demonstrated through their productivity results, as shown in Figure 4-1. In the quarter when they were fully using auto-remediation, not only did they close 384 maintenance stories (a 1,100% increase), they also added 40 additional business stories (a 30% increase).

The company is also separating true “technical debt” from “maintenance” stories. Now upgrades in software versions and their cascading dependencies—something that is beyond the development team’s control—are recategorized as maintenance. This will significantly reduce the black hole of technical debt and the perception of bad coding practices for the team.

Case Study: “Freedom and Responsibility”

This case study features Jonathan Schneider and his experience with the Netflix “freedom and responsibility” culture and driving change. In the weeks leading up to his joining Netflix as a member of the Engineering Tools team in 2014 (thanks to Mike McGarr), Jonathan wasn’t sure whether to believe the Netflix culture deck, specifically as it related to “freedom and responsibility.” This was a company that relied on a culture of high-performance, creative, self-disciplined workers, not process adherence, to achieve corporate goals. The only “good process” was something that helped talented people get more things done.

Regarding engineering (specifically, the source code), he’d only ever seen centrally initiated efforts to manage developer workflow backed by executive sponsorship defeating successive waves of organizational resistance through sheer force of will. What Jonathan soon learned is that the best solutions can emerge when centrally forced change does not meet the culture.

“Freedom and responsibility” at Netflix allowed platform teams to test the value of solutions without the effectiveness bias that being able to force a solution causes. Jonathan saw this firsthand when he unwittingly stepped into a migration engineering role. How do we get teams from Java 6 to 8? Gradle 2 to 5? Framework version 1 to 2?

The diffusion of responsibility created challenges for these migration projects. Mass communication to development teams turned out to not be the answer. Providing code search and reporting didn’t help.

Some of his colleagues went so far as to create a hack day project that identified when an engineer made a breaking API change as well as the teams that would be impacted. They automated the production of one-off videos that began with the title “Here’s Who You Hurt Today,” followed by a series of profile pictures of engineers on impacted teams, with background music by Sarah McLachlan singing about angels.

All the communication in the world generated minimal action. Why? The diffusion of responsibility leaves nobody with a sense of urgency, which is why we experience internal resistance, even in top-down organizations.

Being required to negotiate (beg might be a better word) with product engineers for a change, Jonathan heard the same refrain: “Do it for me, and I’m happy to accept the change.” But how could he, as a small team member, “do the work” for everyone else?

It led him to an entirely different solution, one he would not have imagined without “freedom and responsibility.” He invented an open source automated refactoring solution called OpenRewrite. As a tool that would essentially fix the source code for developers, it fit in nicely with the Netflix culture and “good process” notions.

Case Study: OSS CVE Remediation Change Campaigns

Let’s be honest: developers dread the assignment of new CVEs to pieces of the software supply chain they use. Naturally, vulnerability reporting and CVE assignment are meant as a force for good, exposing vulnerable parts of our application so that we can close gaps before bad actors do. But to the line developer, a new CVE represents unplanned work—issues they neither introduced nor have the energy to fix given the backdrop of feature development commitments they have already made to their business.

The unplanned work has two equally important parts: impact analysis to discover the specific parts of our codebases that are vulnerable and the step-by-step remediation of each of these vulnerable components. Finding a pattern across thousands of repositories is not an easy task to begin with, and the remediation quickly becomes a repetitive task for engineers.

Security vulnerability repair is a great example of an activity that currently suffers from the diffusion of responsibility. What if, rather than “shifting-left” the burden of the remediation onto every developer, security researchers instead provided recipes that simultaneously identified and automated the fix for developers?

The unit economics of shift-left are out of whack because every developer impacted is independently making the same fix. The unit economics of a security researcher writing a recipe are better. Under the current system, a researcher broadcasts an issue, and each developer spends a fixed amount of time manually remediating every occurrence. The cost is at least linear with respect to the number of occurrences. With auto-remediation, the security researcher can provide an automated fix along with their disclosure. Time is then constant, approaching zero for the impacted developers.

Jonathan Leitschuh is one of the first security researchers thinking in this new way. He is currently a senior software security researcher working for the Open Source Security Foundation (OpenSSF), and prior to that he was the first ever Dan Kaminsky Fellow at HUMAN Security. The first big-impact security research he was involved with repaired the man-in-the-middle vulnerability inherent to Java dependency resolution over HTTP-supporting Maven repositories. It was an industry-wide security vulnerability that affected the entire Java ecosystem supply chain.

He has worked closely with Moderne since 2022 to produce recipes remediating Zip slip and similar vulnerabilities and iterating on the process by which fixes are mass-issued for a vast amount of open source code. He says:

I’ve been taking my knowledge of security vulnerabilities and their [Moderne’s] ability to rewrite that actual code and collaborating to try to come up with a set of vulnerabilities that are good candidates for this kind of automated fix.

There’s a lot of low-hanging fruit…​I’m trying to figure out how we can chase those specific ones down and eliminate them. And then on top of that, if I get the opportunity, there are certain vulnerabilities that are low-hanging fruit, but they’re low-hanging fruit because the code itself is vulnerable and its root cause is deeper.

There’s this trend that you’ll see in security where there are all these scanning tools for finding security vulnerabilities, like external entity processing [XXE]; there’s all these scanning tools to find that vulnerability in Java, Python, [and] C++. The root cause of those vulnerabilities is really the libraries themselves, and the underlying infrastructure that these libraries are depending on are themselves vulnerable. [One] of the [other] things that I’ve been trying to chase down is how to fix this from a root cause perspective.

Of course, there is friction here. The industry has over-rotated to mass alerting and manual fixing for too long. Fortunately, GitHub now provides a private pull request mechanism that a security researcher can use to submit a fix to a security vulnerability without publicly outing the affected component at the same time. Researchers like Jonathan are using their strong relationships with vendors like GitHub to mature the process by which security researchers can provide remediations.

Case Study: Migration Engineering Automated

A large bank with more than 6,000 developers, hundreds of millions of lines of code, and thousands of third-party software components, was challenged to keep its “living codebase” operational and secure. Application scanning and reviews would relentlessly highlight vulnerabilities, outdated frameworks, API changes, and code quality issues. Unfortunately, due to business priorities, the developers were rarely given the bandwidth to adequately manage the issues.

Migration work in particular was complicated, requiring massive amounts of cross-team coordination and time to change the source code in hundreds of repositories. Developers had to interpret the release notes for each new version of software, identify and evaluate which changes would apply to their code and tests, and manually update the code. During the migration, new vulnerabilities could emerge, shifting focus and slowing the work even more.

For example, the company was amid a major, multiyear Spring Boot 1.5 to 2.7 migration, which included a chain reaction of alignment issues among framework and library versions—in both application code and tests. This migration required 12 minor version upgrades that would amass up to 1,200 changes. During the Spring Boot migration work, a newly discovered vulnerability in Spring Boot 2.3 created an “all hands on deck” situation, making the team switch gears to prioritize upgrading the apps at risk.

Adding to the complexity, a JUnit 4 to 5 migration was required for the Spring Boot 2.4 upgrade. The development team had been in the process of manually and incrementally migrating JUnit tests side by side with the business logic of the application for 18 months (and were only 20% complete).

To solve this, the team opted to use Moderne’s platform. They initially performed an impact analysis on what it would take to fully migrate from Spring Boot 1.5 to 2.7 for one project. They estimated that manually it would take the team 70 hours, but with Moderne they could achieve it in 17 hours or less—an 80% savings. They can now regularly update software and third-party libraries across thousands of repositories, making sure users have the latest version. Another result is that they were able to complete the JUnit migration for three different projects in days (an 86% time savings).

This company also sees the value of working with OSS vendors and security researchers to contribute search and transformation recipes that add to the ecosystem of migrations that help everyone automate.

Get Automated Code Remediation now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.