One thing I run into when working with legacy code that I haven't really seen addressed anywhere else: how to assess the difficulty of working with a particular code base.
Why is this important? Because we are going to be asked to estimate the amount of time it will take to fix a bug or add a feature to said code base. In an ideal situation we could simply estimate the work as if we had written the legacy code ourselves and that there would be nothing to prevent us from being very efficient in doing the work. I call this the 100% or 1x estimate case. Our estimate should be very accurate because we should have a very good grasp of the context the work will be done in and because we have a solid grasp of how many lines of code will need to be modified or added.
But when working with legacy code the 1x estimate is very unlikely. If the code we are working with is new to us, or we haven't had broad experience with it, there may be surprises lurking in the code that make a given bug harder to fix or make a feature take longer to add because of unanticipated refactoring. When this occurs, we need a way to compensate for the risk in our estimate so that we don't run short on time dealing with the unexpected. This helps us give an accurate sense of the true cost of the bug fix or feature, as well as allow us to stay comfortably on schedule.
Also, our client deserves to be given a good sense of the quality of their existing code base. Not some kind of subjective "Not Invented Here" type of thing, but objective ways to look at legacy code to determine the level of difficulty in working with it. The goal is to allow enough time as we go in to work on the code to actually improve the code base, not just to slap a new feature on top or to (crossing fingers) fix a bug.
What I propose is a grading system where the code gets a letter grade. A, B, C, D, or F with + and - modifiers as appropriate. To do this we just need 11 points worth of scoring criteria. For each criterion, you get one point. 11 points = A, 10 points = A-, 9 points = B+, ... , 0 points = F.
So what are the criteria?
1. Code has working unit tests. Tests must work "out of the box" -- no elaborate configuration, no failing tests. A database can be required here (although better if it isn't) as long as it doesn't require manual seeding of test data.
2. Tests cover core application logic. For example: tests on a shopping cart that cover add/removing items from the cart, but do not cover a complex pricing model, are not worth much. Tests that cover the hard parts earn this point.
3. Core application logic is separate from database logic. If an application uses the Active Record pattern, that's OK even though this mixes core models and database. What isn't OK is having a data model that is utterly indistinguishable from a data store. We want to be able to test our primary logic without a database. We want to be able to switch away from any given data store. Knowing that some frameworks tie these together more tightly than others, obviously we're not going to dock a Rails app for having core models that inherit from ActiveRecord since AR allows a lot of flexibility in this regard.
4. Core application logic is separate from presentation logic. No business rules are allowed in MVC views or ASP.NET code-behinds. Indeed, even MVC controllers should be passing most of the work through to Model code for handling.
5. In larger systems the code is properly segregated into different projects, gems, bundles, DLLs, libraries, etc. Obviously dependent on framework used, but an example of a failure here is an ASP.NET MVC solution that has core application logic, database access, and web code all in the same project file.
6. Code compiles without warnings or errors. Some may want to downplay the importance of code that compiles without warnings, but it is almost always a sign of neglect.
7. Code conforms to a style guide of some sort. In .NET this would be something like a set of StyleCop rules. Point being, when we open two different files in the code base, they look and feel the same.
8. No long methods. It is extremely rare that a method should be longer than 32 lines. If a method or function is over 64 lines then this is an automatic fail. If a method or function is between 32 and 64 lines, then there can be no more than two nested if/then types of conditions and no more than three nested loops.
Patterns and Paradigms
9. In an object-oriented language, objects must be used as objects, not merely data structures. Behavior specific to a class must be encapsulated by that class. One should not find extensive direct use of an instance's properties. One exception is repository code designed to load or save an instance to a data store. Another exception is giving view model constructors the model instances they represent and having the view model access model properties directly to initialize themselves or to update those models later. In non-object-oriented languages we have similar concerns. For example: extensive subversion of immutability in functional languages (as opposed to properly segregating functions with side-effects from those without).
10. No premature abstractions, patterns, base classes, or interfaces. Example: implementing a strategy pattern when there is only one strategy is premature. YAGNI. The point here is to prevent the sort of over-architecting we sometimes see that never really gets filled out. Start with working concrete code! Then if the situation evolves and we see a way to abstract a commonality or bring a design pattern to bear, go for it! If you have a repository system that requires you to write all new methods in two places just to satisfy the requirement of having both an interface or abstract class for the data this is a little bit insane (but not rare, I think). The usual justification for this is unit testing. How will we mock the data layer if it's concrete? Easy. We either don't bother (as in Rails) or we build the data layer so that it doesn't directly depend on a database and so that its methods can be overridden in a unit test library.
11. This one is the opposite of #10: No procrastinated abstractions, patterns, base classes, or interfaces. This is duplicative lines of code that appears in two different methods or duplicative methods occurring in multiple code files. Copy/paste-with-minor-changes style of coding rather than truly engineering a proper system. Even within a function or method, failing to use a looping structure in favor of multiple lines of code needs to be justified in some way.
Once these criteria have been scored and the code graded, we can use the letter grade to help adjust (aka "pad") our estimates so that we don't get burned by the code base's faults.
If the code gets an A, we are probably safe to pad our estimate not at all or only a little bit. The primary issue here will not be dealing with unpleasant surprises, but with our own lack of context and experience with the code base.
On the other hand, if the code gets a D, we will need to up our estimate, perhaps by a very large factor. Changes we need to make will be hard to isolate. Even though we might think we know what needs to be done when we start, we are likely to find some other area that either needs to be modified also because it is too tightly coupled with the code we are working on, or we'll find multiple areas that all need to have the same changes made. We may need to spend more time than anticipated reading through confusing code or running in a debugger to isolate side-effects or process details.
The grade doesn't necessarily tell us whether the code before us meets the requirements or not, by the way. This has nothing to do with the user experience or whether the previous programmers were good people (or even good programmers). It doesn't even mean the client got a bad deal on prior development efforts.
The whole point of this exercise to help us evaluate the level of risk involved in working with this code base and give us a way to explain why our estimates are what they are if our managers or clients expected lower estimates. I base these criteria on experience I've had when inheriting various code bases over the years. My optimistic estimates fall victim to issues related to lack of ability to test that my changes did not break some other aspect of the application (because there were no tests or insufficient test coverage), difficulties isolating the work to be done (and therefore being surprised by additional work), or finding extra work needed to be done such as refactoring abstractions to shared code or adding the same changes in multiple places because of copy/paste code.
It's also possible that a code base could get a straight A on this scale and still be hard to work with, but I'd say that my experience is the opposite; the lower grade a code base gets on this scale, the harder it is to tell ahead of time just how difficult a given task will be.
This is the internet home page of Michael C. Libby.
Here's a bunch of stuff I've done over the years...
- Grew up in the Twin Cities Metro Area in the state of Minnesota in the United States of America.
- Programmed games on the family Commodore 64 as a teenager.
- Got a job at the Minnesota Educational Computing Corporation (MECC) as a high school senior.
- Studied fine art at the University of Minnesota and earned a Bachelor of Fine Arts degree.
- Member of the New Riverside Cafe collective for 2.5 years.
- Made fonts under the "m.libby" and "AprilSkies" monikers.
- For a while: car-free, bicycle enthusiast (with blog).
- Day jobs in corporate America: data analyst, business systems consultant, application systems engineer, software developer.
- Programmed games for the internet: go play Mikey's Games.
Solved a few Project Euler problems:
I do some Stack Exchange Q & A stuff
now and again, as well:
Got certified on Brainbench (Transcript ID#: 11130182):