The Launchpad maintenance teams have been working since the beginning of the year at reducing our Critical bugs count to 0. Without much success this far. The long term trend keeps the backlog at around 300. And it’s not because we haven’t been fixing these. Since the beginning of the year, more than 800 Critical bugs were fixed, but more than 900 were reported
So I investigated what was the source of all these new critical bugs we were finding. A random sample of 50 critical bugs filed were analyzed to see where and why they were introduced. The full analysis is available as a published Google document.
Here are the most interesting findings from the report:
- Most of the new bugs (68%) are actually legacy issues lurking in our code base.
- Performance and spotty test coverage represents together more than 50% of the cause of our new bugs. We should refocus maintenance on tackling performance problems, that’s what is going to bring us the most bang for the bucks (even though it’s not cheap).
- As a team, we should increase our awareness of testing techniques and testing coverage. Always do TDD, maybe investigate ATDD to increase the coverage and documentation our the business rules we should be supporting.
- We also need to pay more attention to how code is deployed, it’s now very usual for scripts to be interrupted, and for the new and ancient version of the code to operate in parallel.
Another way of looking at this is that Launchpad represents a very deep mine of technical debt. We don’t know how exactly deep the mine is, but we are going to find existing Critical issues until we hit the bottom of the mine. (Or until we finish refactoring enough of Launchpad for better testing and performance. That’s what the SOA project is all about.)
In the mean time, we should pay attention to regressions and fallouts, (those are really the new criticals) to make sure that we aren’t extending the mine!
Photo by Brian W. Tobin. Licence: CC BY-NC-ND 2.0.