Legacy Code: What It Is and 5 Tips for Handling It

[maxbutton id=”3″ url=”https://soundcloud.com/typemock/legacy-code-what-it-is-and-5-tips-for-handling-it” ]

No doubt you’ve heard of legacy code.  Whether you have an exact definition for it or not, you’ve probably heard the term tossed around a bit.  In fact, you may have tossed it around yourself.

Absent some precision, slapping the “legacy code” label on a codebase usually indicates a feeling more than it does any properties of that codebase.  Let me illustrate.

You’ve got a new assignment coming up, and you’re excited.  You’re thinking it’s probably your chance to use some cool stuff and enhance your resume.  Maybe you’ll make use of .NET Core and some of the latest client-side frameworks.  A NoSQL solution is on the table, and it almost goes without saying that you’ll be using GitHub.  You can’t wait!

But when the day finally comes and you get your new marching orders, you couldn’t be more horrified.  Instead of the greenfield project you fantasized about, it turns out that you’re going to be adding features to some old codebase.  I mean a really old codebase.

You download it from your creaky, old, internal source control tool and take a look at the code with as much enthusiasm as you’d have for opening your fridge after a week of no power.  And it’s worse than you thought.  It’s as if someone captured the aftermath of an earthquake and turned it into source code, and you’re tasked with fixing the sewage treatment facility.

Legacy code is a developer’s fear, disappointment, and disgust.

Defining Legacy Code

Okay, so that’s the feeling most developers get, at least initially.  But that’s not exactly an academic definition of the situation.  So what is legacy code?  How do we define it without metaphors and tales of bitter disappointment?

It turns out it’s not that easy.  People define it in several different ways:

  • Wikipedia uses a relatively common definition: code maintained by someone other than the original author.
  • Some use a variant of the Wikipedia definition and define it as code in an outdated language or platform that is no longer supported.
  • When developers toss the term around to express negative feeling, it takes on a working definition of “old/bad code that I don’t want to touch.”
  • I’ve seen an interesting definition of legacy code as any code that inspires terror — developers fear to touch the code.
  • And, finally, Michael Feathers, in his iconic book Working Effectively with Legacy Code, defines legacy code simply as “code without unit tests.”

I won’t presume to give you the canonical definition here.  Know that people define it in different ways, and pick the definition that you think best suits your situation.

Personally, as a proponent of TDD, I like the Michael Feathers definition.  But for the purposes of this post, I’m going to relax that definition and use the one about fear.

Why?  Because a fear of touching code presents a broad and definite problem to a software developer.  So, for the rest of this post, I’ll provide tips on how to solve that problem — how to deal with legacy code that you’re afraid to touch.

1. Don’t Judge the Author of the Code

We’re all guilty of this to some extent or another.  Developers seem to have a visceral reaction to code written by others (or even by themselves from long enough ago).  You should really fight this impulse to judge, though.

This advice might seem wishy-washy, but I assure you it isn’t.  I’m not advising you to reserve your judgment because I want to preserve harmony across all of humankind.  Sure, it’s better if you get along with others in your group that write code.  And it’s probably even a good exercise in tolerance on some level to give a pass to a developer long departed.

But the real issue here isn’t others’ feelings.  It’s your judgment.

If your immediate reaction upon sitting down to work with some legacy code is to snort derisively and start counting the ways that the author is an idiot, you’re framing a narrative for everything you see in that codebase.  Why did the author use a loop instead of unwinding the logic?  He’s an idiot.  Why did he call this particular method here?  Idiot.  Why did he do anything and everything?  Idiot.

People who earn a living writing software may have varying skill levels, but they’re rarely idiots.  There’s some kind of method to the madness.  And you’ll do better with the legacy code if you look for that method.

2. Write Characterization Tests

You’re fretting about how to deal with the legacy code because you need to make changes.  If you didn’t, the code would be an innocent curiosity rather than a problem.

But take your fingers off the keyboard, and don’t you dare touch that production code just yet.  That code has been around a long, long time and people depend on it in all sorts of weird ways in production.  Before you do anything with it, you want to study it, catalog as much of its behavior as possible, and understand it.

And you do that with characterization tests.

All this means is that you create automated tests that exercise the system, as-is, and confirm the results.  Is there an add method that returns 2 when you call Add(1, 1)?  Great.  Create a test that supplies those inputs and asserts that output.  Does Add(2, 2) result in 5?  Well, hey, that’s weird, but that’s what it does, so assert a return value of 5 and keep writing tests.

Your goal isn’t to figure out what the system should do but rather what it actually does, hence the name characterization.  You’re like a code anthropologist — just there to observe and categorize.

3. Bring in Tooling to Help Your Testing and Characterization Efforts

Writing characterization tests is great in principle.  And it can work well in practice when a couple of things are true:

  1. You have a relatively isolated bit of functionality, like the aforementioned, hypothetical Add() method.
  2. The scope of your change is relatively limited and narrow.

But often, one or both of these won’t apply.  Legacy codebases are rarely modular and well factored.  Far more frequently, they’re snarled and confusing masses of interdependence.  And on top of that, you must often make rather sweeping changes to them instead of just a quick tweak here or there.

In this situation, characterization tests are still essential.  But handwriting them individually quickly becomes impractical.  Fortunately, you have a few options.

  1. Tools out there will help you generate unit tests based on properties in the code, thus putting your characterization on auto-pilot.
  2. Other tools work from the inside, so to speak, helping you isolate parts of the code even in tangled codebases.
  3. And finally, there are tools that you can use to automate end to end testing of the system to help you characterize even without granularity.

4. Keep Changes to a Minimum and Boy-Scout As You Go

Alright, now you’re ready to start making actual changes.  Once you’ve thoroughly characterized the codebase, both with your own tests and with automated help, you can change the code with more confidence, knowing that you’re less likely to break things in this system.

But make no mistake, change is still risky, even with your best faith characterization effort.  This is a system that nobody fully understands any longer.

So as you make your changes, keep them to a minimum.  Don’t go wandering around the codebase, formatting the code, renaming things, and imposing your preferences on it.  Touch the smallest amount of code that you can, verify your changes, and make sure your characterization tests pass.  This will keep risk to a minimum.

That said, you should take some opportunity to introduce change, where sensible.  If you’re already in a method, then you’re already incurring the risk of changing it.  So abide by what Uncle Bob calls “the boy scout rule” and leave the code a little better than you found it.

Avoid changing anything you don’t need to, but when you have to change something, try also to improve it a little.  After all, you’re probably not done with this codebase after one single change.

5. Work Gradually Toward Modularity and Better Design

You made your first change to the legacy codebase by characterizing heavily, changing minimally, and boy-scouting a little.  And, great news — it worked out!  You now have an excellent playbook for each subsequent time that you need to touch this codebase.

So I’ll close with a last bit of advice that addresses the long haul instead of the tactical and short term.  You want to gradually work the codebase toward modularity.

With each change, you’re going to add more characterization tests, get to know the codebase more, and boy-scout where you can.  This is going to start to provide you with better test coverage, bolster your confidence, and eliminate the fear of touching the code.  And you can build on this growing confidence.

As time goes on, you can expand your boy-scouting and start to perform more and more refactorings aimed at teasing the code into looser coupling and improving modularity.  Why does this matter?  Well, because if you make the codebase modular enough, you can start to migrate parts of it to newer platforms, or modernize it with language/framework upgrades.

Legacy code, in many ways, is a vicious cycle.  The more you fear to touch it, the more you let it languish and grow obsolete.  On the flip side, the more you gain confidence with it, the more you can bring it back from the brink and expand your options.  Once you get rid of the fear, you get rid of the legacy.