Legacy, performance, testing: 6 months of new critical bugs analyzed
The Launchpad maintenance teams have been working since the beginning of the year at reducing our Critical bugs count to 0. Without much success this far. The long term trend keeps the backlog at around 300. And it’s not because we haven’t been fixing these. Since the beginning of the year, more than 800 Critical bugs were fixed, but more than 900 were reported 🙁
So I investigated what was the source of all these new critical bugs we were finding. A random sample of 50 critical bugs filed were analyzed to see where and why they were introduced. The full analysis is available as a published Google document.
Here are the most interesting findings from the report:
- Most of the new bugs (68%) are actually legacy issues lurking in our code base.
- Performance and spotty test coverage represents together more than 50% of the cause of our new bugs. We should refocus maintenance on tackling performance problems, that’s what is going to bring us the most bang for the bucks (even though it’s not cheap).
- As a team, we should increase our awareness of testing techniques and testing coverage. Always do TDD, maybe investigate ATDD to increase the coverage and documentation our the business rules we should be supporting.
- We also need to pay more attention to how code is deployed, it’s now very usual for scripts to be interrupted, and for the new and ancient version of the code to operate in parallel.
Another way of looking at this is that Launchpad represents a very deep mine of technical debt. We don’t know how exactly deep the mine is, but we are going to find existing Critical issues until we hit the bottom of the mine. (Or until we finish refactoring enough of Launchpad for better testing and performance. That’s what the SOA project is all about.)
In the mean time, we should pay attention to regressions and fallouts, (those are really the new criticals) to make sure that we aren’t extending the mine!
Photo by Brian W. Tobin. Licence: CC BY-NC-ND 2.0.
December 10th, 2011 at 2:32 pm
I can see that the new rapid ‘disruptions’ have helped to incredibly speed up the development and bug fixing, so keep it ip guys, and congrats!
December 10th, 2011 at 11:06 pm
Does your rate of fixing regressions and fallouts exceed your rate of creating them?
December 13th, 2011 at 7:46 pm
@Jonathan, excellent question. The long term average of regressions filed is 3.53 regressions per week, and the long-term average fix rate is 3.17 regressions fixed per week. So as of today, the creation rate exceeds slightly the fix rate.
But our overall Critical bug fixes rate which is mich higher: 18.54 bugs by week. So if we direct our attention to regression and fallouts (to prevent to accumulate them as tech-debt), our fix rate exceeds the rate of creation a lot.
I’m excluding fallout from this because we don’t really have trending data on these yet, but I don’t think it change things that much, given what the analysis showed. A very conservative estimate would be a combined creation rate of 7 per weeks, which is less than half of our fix rate.