Here is the golden rule of organizing bugs: fix bugs in the order most likely to result in success. Sounds obvious, right? Wrong. I'd bet that more than half of the buggy and unreliable software you've ever used was that way not because the developers didn't have time to make it better; they simply fixed the wrong bugs. Wanting to fix the right bugs and knowing how to do it are two different things.
There are two challenges to making smart bug decisions: first, understanding how to make good bug-fix decisions; and second, creating and following a process that makes it easy to stick to those decisions when the pressure is high.
Wise leaders and smart teams know that toward the end of a project, milestone, or iteration, they'll be tired. They may be sleeping less, working more, and consuming frightening amounts of caffeinated substances. Thinking ahead, clever leaders put simple rules and survival kits in place early. Then, when times get tough, the resources for fast and easy bug fix decisions are already there.
This two-part essay is a primer on those rules and survival kits, giving you basics to follow. But more importantly, I'll provide the core ideas needed to make your own rules. The advice is organized into four levels, from scrappy first aid (level 1) to higher-caliber planning (level 4). But first, an entirely unnecessary but entertaining summary of approaches to avoid.
The Top 10 worst ways to decide:
Hopefully, you've never seen any of these in action; and now that you've been warned, you can avoid them in the future. Should your manager suggest any of these, please stand up, turn around quietly, and run as fast as you can.
In the best web and software development shops, the management of bug fixing is like medical triage. Someone takes the lead role of going through incoming bugs and putting them into three or four basic piles. (This is called triage, bug wrangling, or defect management.) Like any large number of things you might have--CDs, books, debts, girlfriends--the only way to get a handle on bugs is to organize them into higher-level groups. That makes it easier to understand what you have, discuss it with others, and find an appropriately qualified therapist. As a universal rule, it's always easier to work with three or four piles of things than with hundreds or thousands of individual things.
So the best first aid when overwhelmed by bugs is to stop everything else for an afternoon and triage. (Test: if you can't remember the last time you did triage, you should stop reading and do it now.) You can't sprinkle magic dust on your bug database to put the bugs in order; someone courageous has to get in there, get their hands dirty, and sort things out. I promise you there is no way around this. If you're disciplined, you might be able to triage regularly, once a day throughout the entire project, never letting things get out of control. Or you might motivate every programmer to triage their own bugs frequently. That's great. But however you do it, it must be done.
Before you skip ahead, saying, "I know about triage, but I don't do it because of blah, blah, blah," know that triage is required for sanity in any first aid effort, whether medical or technical. There is no sense in putting a Band-Aid on a patient's scraped knee if there are a half dozen poisoned arrows in his back. Without triage, you have no way of knowing where the team's energy is best spent. And unlike the obvious seriousness of a patient frantically pointing over his shoulder to a quiver of uninvited arrows lodged in his person, your code base won't tell you where it hurts most. You have to figure it out yourself.
Triage forces other smart things to happen. By going through your bugs, the leader of the triage effort forces everyone to have a better view of the project. He'll find many bugs that are duplicates, already fixed, missing information (and in need of being sent back to the opener of the bug), ignorable, or simply ridiculous--for example, a complaint that the web site doesn't predict winning lottery numbers. It's common to see bug count numbers drop by 30 percent after the first triage, making it a surefire morale boost. But you can't get that easy win without doing the work.
In terms of specific piles for sorting, there are an infinite number of ways to organize bugs, and every veteran has his or her own opinions on the best way. As usual, there's no single right answer. Use something simple, and plan to improve it next time based on what you learn and how things change for the next project.
The simplest scheme has three piles: Must Fix; Might Fix; Won't Fix. As you triage each bug, put it into one of these three piles. The more bugs that make it into Might Fix and Won't Fix, the more effective your triage is, as those piles represent clear decisions.
What you don't want is for 99 percent of your bugs to go into Must Fix; this is called "coward's triage." If everything is a Must Fix, you're saying that everything is equally important, which is meaningless. You've chickened out. Remember the golden rule: put bugs in the order most likely to lead to success. If all bugs are equal, there is no order, and success is improbable.
If you are doing bug first aid, aim for 50 percent Must Fix, and the rest Might Fix or Won't Fix. Put a stake in the ground and get serious about your bugs. Apply your judgment about remaining resources (time and people) and what you believe is most important to the project (customers and business). Late in a project, no leadership action is greater than the strong, proactive management of bugs and open issues. Aim for 50 percent Must Fix and push until it hurts; it's only when there's pain that you know you're doing real triage. ER doctors don't wave the white flag or call a time-out every time they're forced to prioritize one patient over another. If they can do it for people's lives, you can do it for bugs. So don't hide behind the ignorance of a coward's triage: get serious, get tough, and lead your team.
The basic rules for the three piles are as follows:
If a bug is a Must Fix, it must be more important than any bug in Might Fix and Won't Fix.
If the bug is a Might Fix, it must be more important than any bug in Won't Fix.
If the bug is a Won't Fix, it must be less important than the least important Might Fix bug.
It helps to admit that all bug-fix decisions are relative; there are no absolutes. Defining importance involves many factors, and it can be difficult to place bugs into the three piles. Smart teams put well-defined markers in place, called exit criteria, to help make those decisions easier. (I'll cover exit criteria in Level 3, coming in Part 2.) But if arguments are frequent, don't worry. I guarantee it will be a minority of bugs in each pile that are contentious. Focus people on the positive, the bugs everyone has agreed on. If there are 50 bugs everyone agrees are Must Fixes, there might be days or weeks of development work by the team before the other bug debates need to be resolved.
The Must Fix pile is what programmers should be working from. They shouldn't be touching anything in the Might Fix pile unless they're helping to triage. The reason is simple: don't worry about what you might need until you have everything you know you need. (For example, don't worry about dessert until you've figured out dinner.)
It's always tempting to steal bugs from the Might Fix pile. Many fun things will end up in there, including bugs that annoy your team, that impact beloved features, or that are just fun to fix. But the criteria for prioritizing work should be what is most important to the project and the customer. A team's commitment to serving the project goals over other desires is often the difference between quality and shoddy work.
Often, weekly triage is enough. Every Monday morning, you get the right people in the room, triage, and leave with a plan for the best use of your team that week. The Must Fix issues should be distributed across the team intelligently--either they self-select, or programmers are designated owners of particular areas and kinds of issues. If the Must Fix pile is too large, repeat the triage process again by dividing Must Fix into two piles: Must Fix this week, and Must Fix eventually. Depending on how fast new bugs are coming in and what changes customers or managers make, you might need more frequent triage sessions.
Unless you're fixing bugs as you go (a good but surprisingly unpopular strategy), most of the bug-fix decisions will be made late, when you're under the most pressure. Knowing this, you want the data for each bug to be good enough to let you make decisions quickly. If you spend half of your triage time struggling to reproduce bugs, or even trying to comprehend what the issue is, you're wasting time. Quality descriptions and reproduction information could have been provided days or weeks earlier.
What you want, then, is a bug database that's a brightly lit, well-organized supply cabinet, not a demon-haunted, cobweb-filled, rat-infested attic. You want programmers to get in, easily identify what they need, and get back to work. This requires regular maintenance of the bug database, and diligence from anyone opening bugs. The higher the quality of information in the bug database, the less time you'll spend in triage, and the more time your team will spend actually fixing bugs. (Warning: often it's the first triage that makes visible the quality, or lack there of, of bug reports.)
One way to improve the quality of bug information is to create smarter piles. Instead of only one piece of data (Must/Might/Won't), use track two: Priority and Severity.
Priority is easy: Instead of Must Fix, call it Priority 1. Instead of Might Fix, it's Priority 2. And Won't Fix becomes Priority 3. Some teams go as far as creating a Priority 4: they make Priority 3 mean "probably won't fix," and 4 becomes "won't fix until hell freezes over, warms up nicely, and then freezes again." I've never seen a successful team use more than 4 priority levels, so if someone insists on 15 of them, by all means run for the hills.
Severity describes how serious the bug is to the customer when it occurs. Separating this from Priority gives you a better view of the bug, since you can understand its impact separately from the significance of its occurrence. For example, you might have a bug that causes the user's monitor to explode (Severity 1), but since it occurs only when she triple-clicks on a menu while singing the Australian national anthem in German, it's a low-priority issue (Priority 3).
For this to work, someone has to sit down and define the difference between Severity 1, 2, and 3, preferably using examples of real bugs to help people understand the difference. Then, whenever a new bug is opened, this field is set appropriately. Someone will have to go back and add this information for old bugs (and it's probably you).
Here's one basic severity system. I recommend that you and your team get together and negotiate these:
Severity 1--Data loss. Customer loses information or sustains damage to his or her work. May be impossible to repair or require reinstallation (or a browser refresh).
Severity 2--Functionality impossible or difficult. A major feature doesn't work as expected and is either impossible to use or requires a significant workaround.
Severity 3--Annoyance. A minor feature doesn't work as expected. A workaround may exist but is annoying, frustrating, or difficult to discover.
Using these two bits of information, you can now sort remaining bugs in smarter ways. Instead of just working with three big piles, you can now ask more sophisticated questions. Not only can you prioritize bugs by overall priority, but within each priority pile you can also sort by how serious the defect is. It's one quick way to arrange bugs within any particular priority level.
The third most important bit of data to add to bugs is the area of the project they impact. The larger your team, the more important this is. The area should signify what part of the project is impacted by the bug. Is it the print feature? The search engine? Break the entire project into four or five areas, and include an area field in the bug database. This gives you a third way to view your project: you can identify which areas of the project have the most issues, or prioritize around the areas of your project that are most important to you and your customers. If each programmer is responsible for a single area, this field gives them a way to filter out bugs that aren't currently relevant to them.
There are many other bits of data to include. Common ones are: quality steps to reproduce the bug, the version of the software the bug was found in, a unique ID number, a one-sentence (human comprehendible) description, and the name of the person who found the bug. Every project is different, and the kind of data you want to track from project to project will change.
In the not particularly unexpected conclusion to this essay, I'll cover:
Level 3: exit criteria
Level 4: early planning
Exceptions to all of these rules
Frequently asked questions
References and resources on making bug decisions
In April 2005, O'Reilly Media, Inc., released The Art of Project Management.
Chapter 3: How to figure out what to do (PDF) is available free online.
For more information, or to order the book, click here.
Scott Berkun is the best selling author of Confessions of a Public Speaker, The Myths of Innovation, and Making Things Happen. His work as a writer and public speaker have appeared in the The Washington Post, The New York Times, Wired Magazine, Fast Company, Forbes Magazine, and other media. His many popular essays and entertaining lectures can be found for free on his blog at Scott Berkun.
Return to ONLamp.com.
Copyright © 2009 O'Reilly Media, Inc.