ONLamp.com
oreilly.comSafari Books Online.Conferences.

advertisement


How to Decide What Bugs to Fix When, Part 1
Pages: 1, 2

The Rules of Ultrasimple Triage

The basic rules for the three piles are as follows:

  1. If a bug is a Must Fix, it must be more important than any bug in Might Fix and Won't Fix.

  2. If the bug is a Might Fix, it must be more important than any bug in Won't Fix.

  3. If the bug is a Won't Fix, it must be less important than the least important Might Fix bug.

It helps to admit that all bug-fix decisions are relative; there are no absolutes. Defining importance involves many factors, and it can be difficult to place bugs into the three piles. Smart teams put well-defined markers in place, called exit criteria, to help make those decisions easier. (I'll cover exit criteria in Level 3, coming in Part 2.) But if arguments are frequent, don't worry. I guarantee it will be a minority of bugs in each pile that are contentious. Focus people on the positive, the bugs everyone has agreed on. If there are 50 bugs everyone agrees are Must Fixes, there might be days or weeks of development work by the team before the other bug debates need to be resolved.

The Must Fix pile is what programmers should be working from. They shouldn't be touching anything in the Might Fix pile unless they're helping to triage. The reason is simple: don't worry about what you might need until you have everything you know you need. (For example, don't worry about dessert until you've figured out dinner.)

It's always tempting to steal bugs from the Might Fix pile. Many fun things will end up in there, including bugs that annoy your team, that impact beloved features, or that are just fun to fix. But the criteria for prioritizing work should be what is most important to the project and the customer. A team's commitment to serving the project goals over other desires is often the difference between quality and shoddy work.

When to triage

Often, weekly triage is enough. Every Monday morning, you get the right people in the room, triage, and leave with a plan for the best use of your team that week. The Must Fix issues should be distributed across the team intelligently--either they self-select, or programmers are designated owners of particular areas and kinds of issues. If the Must Fix pile is too large, repeat the triage process again by dividing Must Fix into two piles: Must Fix this week, and Must Fix eventually. Depending on how fast new bugs are coming in and what changes customers or managers make, you might need more frequent triage sessions.

Level 2: Smarter Piles

Unless you're fixing bugs as you go (a good but surprisingly unpopular strategy), most of the bug-fix decisions will be made late, when you're under the most pressure. Knowing this, you want the data for each bug to be good enough to let you make decisions quickly. If you spend half of your triage time struggling to reproduce bugs, or even trying to comprehend what the issue is, you're wasting time. Quality descriptions and reproduction information could have been provided days or weeks earlier.

What you want, then, is a bug database that's a brightly lit, well-organized supply cabinet, not a demon-haunted, cobweb-filled, rat-infested attic. You want programmers to get in, easily identify what they need, and get back to work. This requires regular maintenance of the bug database, and diligence from anyone opening bugs. The higher the quality of information in the bug database, the less time you'll spend in triage, and the more time your team will spend actually fixing bugs. (Warning: often it's the first triage that makes visible the quality, or lack there of, of bug reports.)

One way to improve the quality of bug information is to create smarter piles. Instead of only one piece of data (Must/Might/Won't), use track two: Priority and Severity.

Figure 1
Figure 1.

Priority is easy: Instead of Must Fix, call it Priority 1. Instead of Might Fix, it's Priority 2. And Won't Fix becomes Priority 3. Some teams go as far as creating a Priority 4: they make Priority 3 mean "probably won't fix," and 4 becomes "won't fix until hell freezes over, warms up nicely, and then freezes again." I've never seen a successful team use more than 4 priority levels, so if someone insists on 15 of them, by all means run for the hills.

Severity describes how serious the bug is to the customer when it occurs. Separating this from Priority gives you a better view of the bug, since you can understand its impact separately from the significance of its occurrence. For example, you might have a bug that causes the user's monitor to explode (Severity 1), but since it occurs only when she triple-clicks on a menu while singing the Australian national anthem in German, it's a low-priority issue (Priority 3).

For this to work, someone has to sit down and define the difference between Severity 1, 2, and 3, preferably using examples of real bugs to help people understand the difference. Then, whenever a new bug is opened, this field is set appropriately. Someone will have to go back and add this information for old bugs (and it's probably you).

Here's one basic severity system. I recommend that you and your team get together and negotiate these:

  • Severity 1--Data loss. Customer loses information or sustains damage to his or her work. May be impossible to repair or require reinstallation (or a browser refresh).

  • Severity 2--Functionality impossible or difficult. A major feature doesn't work as expected and is either impossible to use or requires a significant workaround.

  • Severity 3--Annoyance. A minor feature doesn't work as expected. A workaround may exist but is annoying, frustrating, or difficult to discover.

Using these two bits of information, you can now sort remaining bugs in smarter ways. Instead of just working with three big piles, you can now ask more sophisticated questions. Not only can you prioritize bugs by overall priority, but within each priority pile you can also sort by how serious the defect is. It's one quick way to arrange bugs within any particular priority level.

The third most important bit of data to add to bugs is the area of the project they impact. The larger your team, the more important this is. The area should signify what part of the project is impacted by the bug. Is it the print feature? The search engine? Break the entire project into four or five areas, and include an area field in the bug database. This gives you a third way to view your project: you can identify which areas of the project have the most issues, or prioritize around the areas of your project that are most important to you and your customers. If each programmer is responsible for a single area, this field gives them a way to filter out bugs that aren't currently relevant to them.

There are many other bits of data to include. Common ones are: quality steps to reproduce the bug, the version of the software the bug was found in, a unique ID number, a one-sentence (human comprehendible) description, and the name of the person who found the bug. Every project is different, and the kind of data you want to track from project to project will change.

Coming in Part 2

In the not particularly unexpected conclusion to this essay, I'll cover:

  • Level 3: exit criteria

  • Level 4: early planning

  • Exceptions to all of these rules

  • Frequently asked questions

  • References and resources on making bug decisions


In April 2005, O'Reilly Media, Inc., released The Art of Project Management.

Scott Berkun is the best selling author of Confessions of a Public Speaker, The Myths of Innovation, and Making Things Happen. His work as a writer and public speaker have appeared in the The Washington Post, The New York Times, Wired Magazine, Fast Company, Forbes Magazine, and other media. His many popular essays and entertaining lectures can be found for free on his blog at Scott Berkun.


Return to ONLamp.com.



Sponsored by: