Shai Yallin
- Apr 5, 2022
- 7 min read

The Anatomy Of A Rotten Codebase

Updated: Apr 27, 2023

Code rots. Just like everything else in life. It's one of those analogies I keep coming back to. But rotten code isn't just a problem those spoiled, nit-picking developers have to deal with. Rotten code contributes to developer attrition, inability to respond to changing market landscapes, and the premature demise of tech startups. Is this just a fact of life? is there anything we can do to avoid it? In this post, I will describe the process in which code rots, and expand on ways to build systems that do not rot by optimizing for the changeability of our software.

Nobody wants the credit of having started a rotten codebase. We all start a new project with the strict of aim of "getting it right this time". We may choose tools, frameworks, technologies and patterns to help us prepare for the unforeseen so that after a few months or years we are still proud of the code we created. However, most of us find ourself deeply ashamed of systems we built years ago. Why does that happen? Why can't we find ways to create codebases that are truly fun and easy to maintain over long periods of time? My experience has been that it's down to two things: the sense of urgency that always looms in a startup environment, and our tendencies as engineers to prepare for problems we have encountered in the past, assuming that the same problems will repeat themselves in the future.

"The Hurrier I Go, The Behinder I Get."

The sense of urgency is a given; it is the driving principle behind a tech startup company - you build things as fast as you can, create a lot of value in a short period of time, then either sell or go public. It's a "get rich quick" scheme. We can discuss the merits and faults of this approach, but if you work in tech, most chances are that you'll have to endure a constant sense of urgency, usually accompanied by ever-changing requirements; this is also a given - whether it's because we just don't know what we need to build (no one does; if your product manager seems too confident in the requirements, it's a sure sign that they are clueless. Good product managers openly admit that they have no idea what the customers actually need), because when we start interacting with our customers we start to get requests for new features or changes, or because we have decided to pivot or radically change our offering. So it might very well be that in Q1 we spent 12 hours a day working as fast as we can on a feature, but then in Q2 this feature is no longer needed, or has significantly changed.

Experienced engineers who've been around the block know that this is how the industry works, and we tend to create layers of protection around ourselves and our pristine new codebases so that when the time comes, we can make changes easily and happily. However, when the changes do come, we might be surprised to find that it's still very difficult to make changes, because the codebase is now too complex to be able to accommodate the new requirements. At this point, we resign ourselves to stick the changes wherever we can and move on - because we're still under a significant sense of urgency.

Same Mistakes, New Technology

This happens because in our attempts to protect ourselves, we are creating a lot of code that we don't need, based on our current understanding of the problem domain and our past experience dealing with changing requirements under pressure. But no one guarantees that the future problems we'll experience have anything to do with what happened in the past. In a sense, a codebase with no frameworks, design patterns and conventions might be less rotten than a codebase that has been prematurely (over-) engineered. The frameworks, when you actually need them, might be dated; the design you spent so much time discussing and thinking about might be irrelevant, and technologies you put in place to deal with future scaling issues might solve the wrong problems, or even create new ones; one common example is prematurely adding a caching layer, where there's no concrete performance issue; now you have two problems: your architecture still can't deal with scale, and you now have to deal with cache invalidation.

Fast-forward three years of rapid development, and we have a codebase with 5 unnecessary layers of indirection, frameworks that have been obsoleted back when Trump was president, and a ton of hacks people made trying to make the strict and dated (upfront) design accommodate the actual features they had to develop - under a constant sense of urgency. Working on such a codebase often leads to burnout, developer attrition, and the need to onboard new developers, who will take time to get themselves familiar with the messy codebase that's too hard to navigate. These new people will make further ad-hoc changes to the codebase, then burn out and leave. It's a vicious cycle. Eventually we're left with a Big Ball Of Mud that no one wants to touch. If we're successful, we might eventually be able to start a gradual rewrite project, assuming that we actually have a developer vigilant and skilled enough to do that. More often, though, we might either have to endure the rotten codebase for years, or slowly crumble and die, as our development velocity becomes slower and slower due to burnout and fear of change.

Getting Out Of The Rabbit Hole

So what can we do? are we doomed to repeat the same mistakes again and again? Luckily, there is a better way, and it boils down to two principles: coding as little as possible, and managing your pace.

One of the biggest mistakes a developer can make when she starts a new software system is to use a code generator, such as Create React App. It may seem counter-intuitive, because those tools are intended to get you up and running quickly, giving you good conventions and productivity tools out-of-the-box. However, these tools and conventions are opinionated, and might not be what you actually need. The tell-tale of a codebase that was spawned using a code generator is that it may contain a lot of unused scripts and pieces of code, and exhibit a low signal-to-noise ratio - lots of layers of indirection, libraries and frameworks, or mechanics, for very little domain-related, or semantic code. This in and of itself may not necessarily be a problem, until you need to make some changes that do not agree with the opinion of the people who wrote the code generator you used. For instance, create-react-app hides a lot of implementation details of how it works in another module, react-scripts. This is meant to make your app cleaner, and to deal with a broad spectrum of use cases, but it makes it harder to understand exactly how things happen, or to modify, for instance, the way Jest is configured. The same goes for frameworks such as Nest.js, which purports to provide a zero-configuration MVC framework, but brings along a lot of the woes of previous-generation MVC frameworks such as Ruby On Rails or Spring MVC - everything happens by magic, and the codebase is always too structured, even when there's very little of it.

The alternative? start from scratch. It may seem like too much work, but it gives you absolute control of which dependencies you use, the structure of your code base, and it makes sure your frameworks or libraries work for you, rather than the other way around. If you work in short iterations, adding dependencies as you need them, you'll find out that very quickly the time you spent manually creating the initial structure is gained in form of shorter build / install times, simpler structure and code that is just easier to understand, and to change. And it is the changeability aspect that is most important to combat code decay in the long term; code that is easy to change will rot less than code that needs to be broken with a hammer to affect a change.

A Turtle Race

Running fast is important. But, as the cliché goes, it's not a sprint - it's a marathon. It's better to work at a sustainable 80% velocity than at 100% but burn out after 6 months. And surprisingly enough, if you follow the first tip and avoid unnecessary structure and dependencies early on, you'll find that you're actually more productive as time goes on. How do you know what's necessary? it's simple - only add structure or libraries to solve existing problems. Work outside-in, preferably by starting with a failing test, and progressively refactor the codebase towards an emergent design.

You might say "but I have so many features I need to develop and I only have enough seed money for 12 months"; but really, do you have many features to develop? who says so? how do you know that these features are actually needed? Many entrepreneurs feel like they have to offer a rich product to gain market share, but the fact stands that most users will only use a small subset of your features. The solution? use a lean methodology, prove that your value proposition is actually, well, valuable, and only then expand on the features your users are requesting - even if this differs from what you had in mind when you first started.

When you adopt a lean methodology and spend less time developing features no one may actually need, you'll find that suddenly you have a bit more luxury to develop the features you do decide to expand on in a sustainable pace, without compromising on code quality. When you combine this with the practice of optimizing for changeability, you'll have less fires to put out, less code decay to work around - and you might even find that you can make do with a smaller team, thereby extending your runway.

In Summary

Code rots because we build too many features, too fast, and use the wrong techniques to help us deal with the fallout that ensues from going to market after months of this type of work. Using emergent design is an effective way to optimize for the changeability of our software, thus allowing us to run at a more sustainable pace, and easily deal with the uncertainty that will certainly engulf us as we go to market and start interacting with our users.