Five Lessons Open Source Developers Should Learn from Extreme Programmingby chromatic, author of Extreme Programming Pocket Guide
Extreme Programming, or XP, isn't so much revolutionary as it is evolutionary. Developers have known the value of code reviews, testing, and good communication for decades, though we've ignored that knowledge far too often in practice. Five Lessons You Should Learn from Extreme Programming explained several XP practices that apply to non-XP projects. A little common sense, a bit of learning from failure, and a lot of discipline can improve your team.
It's harder to see how XP can apply to open source projects, especially those that apparently lack a formal customer and are generally immune from budget and schedule pressures. The other challenges of software development apply, though. Sustained development requires managing complexity. To build a successful open source project, you must solve many of the same problems you'd face with a successful in-house project.
As you'd expect, there are several lessons open source developers can learn from Extreme Programming.
The second most valuable artifact in any project is an automated test suite. (The first is the code itself.) Because few open source projects have the luxury of pair programming or even mentoring with any reasonable physical proximity, a good test suite is invaluable to understanding the code.
There are two parts to a test suite. The first and most important is the set of customer tests. Think of this as executable specifications. The second part is the set of programmer tests. These exercise individual pieces (functions, classes, and methods) individually. They're also important, but secondary to the customer tests.
For example, in my Mail::SimpleList project, customer tests simulate sent and received email messages. That's it, I only need to mock up the system's entry and exit points. From there, the essential, user-visible behaviors can be tested.
Examining the customer tests allows new hackers to get a feel for how the system works and the features it supports. Anything not tested doesn't exist. Don't count on it working in the next version. Don't count on it working in this version.
The programmer tests operate at a much lower level. They're slightly harder to write because they expose more details. They help immensely while debugging, though, as they can often pinpoint bugs to certain lines of code. As well, programmer tests help keep developers honest. From the developer's point of view, a class, function, or method's necessary behavior is enshrined in an executable test. You can be more confident developing if you know that the tests will catch accidental breakage.
The de facto testing tools in the XP world are all open source. Given the existence of so many excellent open-source testing frameworks, testing is the best place to start. Explore the appropriate testing framework for your language. Write a new feature in a test-first fashion. After all your tests pass and you can't think of any more tests to write, refactor what you've just written. It takes a while to get the hang of testing, but even a few naive tests are much better than nothing.
Retrofitting tests to an existing project is difficult. Instead, write a test every time you touch code, whether you fix a bug or add a feature. This will concentrate your tests where you need them the most. As a side benefit, after you've fixed a few bugs in the same section of code, you'll likely have enough tests to refactor it; bugs tend to congregate.
Of course, you need a well-designed system to get the most use out of your tests. Test-driven development will help. So will the next lesson.
The Unix philosophy of writing simple tools that each solve a single problem simply and can be combined easily and flexibly has worked well for decades. Simplicity's not limited to operating systems. While it's possible to go overboard, perhaps linking against dozens of libraries to avoid writing a few lines of code, too many projects commit the opposite sin.
XP promotes simplicity in two stages of the development process. First, the scope of each release is kept small. Each release represents a fixed amount of development time. Only work that is estimated to fit into this time can be scheduled. Second, each development task is implemented as simply as possible to pass the tests. No unrequested features are added unless the schedule is readjusted.
Open source projects usually don't have the time or budget constraints to require hard and fast release dates, but getting frequent feedback from users and customers is vital to the survival of the project. Since "customers" are often potential developers, having a good feedback loop can increase the resources at your disposal. Keeping the source code public with regular snapshots or anonymous CVS or Subversion access helps, but if features take a long time to land or to stabilize, it can be difficult to know when the code is worth using.
As with in-house projects, soliciting feedback and scrutiny can be scary. It's integral to solving real problems correctly, though.
Anyone can come in to the code at any point, so keep the code accessible. Keep the main source tree passing all of its tests. Fix any problems as soon as they occur. Several large projects, including Mozilla and Perl, have regular smoke tests that run the full test suite as often as possible on as many platforms as possible. It's much easier to track down errors if you can narrow down the breakage to a single change or set of changes. (Andreas Koenig has a script that finds previously unknown regressions in Perl by performing a binary search on changesets. It's very handy.)
Work in steps as small as possible. Minimize each set of changes. Not only is this less to test and less to debug, but there's less migration between changes. Watch how Linus manages big changes to the Linux kernel; he prefers small, steady patches. They're easier to read as they change one thing at a time.
By working in small steps, it's easier to have regular releases. Subversion is a good example. While they have goals for beta and final releases, they release a new snapshot every three weeks. Scheduling releases can be difficult, with random contributions, but the bulk of development likely comes from a few, dedicated coders anyway. You can't control what outside developers produce, but you can focus their contributions into small, manageable pieces.
A small, simple application that does one thing well is much more valuable than a hundred applications with lofty goals that never actually do anything. There's nothing wrong with writing a framework, provided you really need it. It's much easier to generalize a framework out of working projects than to design a framework to fit future, uncoded projects.
A few projects have survived despite going the other way around. Mozilla comes to mind, giving the appearance of building everything but a decent web browser for a couple of years. Don't count on a combination of luck, skill, funding, and determination pulling you through.
Practicing simplicity can be difficult. Test-driven development helps, at least when you're implementing a feature. It's harder to design features simply. If you have other developers, talk about upcoming features and try to find the simplest way to implement them. It may take a couple of rounds of brainstorming on IRC to find a better approach than any one person's best idea, but it will happen.
Refactoring is changing the design of your code without changing its behavior. If you work in small steps and have good test coverage, you can clean up amazing amounts of code without changing externally visible behaviors. Think of it as reducing a complex algebraic equation; you may start with something horribly complex, but you simplify, one step at a time, according to well-known rules and guides, ending up with something you can understand at a glance.
Having maintainable, clean code is an excellent goal. Keep it as a high priority, especially as you write new code. Write tests and practice simplicity. On the other hand, having code that works today is much more important than having beautiful, perfect code that might work again when it's finished. Too many projects rewrite themselves from scratch every few versions as the authors come up with new ideas, forget how their code works, or decide the existing code base isn't worth salvaging.
Rewriting seems tempting; it seems faster and easier. For very simple cases, it may be. If you've been writing tests as you go and if you can demonstrate that your software meets the customer needs (because it passes the customer tests), it's almost always less work to improve existing, working code. Would you build a new house just because your kitchen is full of dirty dishes?
It's also tempting to start over from scratch when you come across code you don't immediately understand, whether or not you wrote it. You'll be better off learning how to read code, though. Even if you're only coding for your own pleasure and education, you'll likely learn more by exploring how other people have already solved problems you might not even realize you'd have encountered. Very few programming problems are as simple as they first seem.
Of course, if you're migrating to a different language or platform, if you don't have any users, if you have licensing conflicts, if you don't have any tests, or if all efforts to work a critical feature into the existing codebase have been unsuccessful or too expensive, sometimes rewriting from scratch is worth the cost. Don't start without giving serious consideration to what you can reuse, though.
Several large and influential projects, including Mozilla, Apache 2, Enlightenment, and Perl 6, have opted for rewrites. It's hard to say whether large-scale refactorings would have worked better, but it's easy to see common drawbacks, including slow migration rates to the new versions and questions of the quality of unknown, unproven new code. Splitting development efforts across two major branches may, as in the case of Perl 5 and Perl 6, spur on extra development effort and help recruit new developers. It's also possible, as in the case of Mozilla versus Netscape 4, that your developers won't want to maintain the old codebase and will provide only the minimum possible upgrades for months or years. An extended period of low maintenance and little visible progress can frustrate existing users and discourage potential new users.
Don't lightly throw away the knowledge embedded in existing, working code.
Common wisdom says "Release early, release often." The earlier you release working code, the better the chance of finding other, like-minded people to give you feedback and to refine your ideas. The more often you release code, the more often you can receive feedback from users. XP projects deliver code to the customer every three weeks or so. These aren't alphas, betas, or even release candidates. They're stable, high-quality releases, capable of being delivered to end users immediately.
You might think XP developers would go crazy trying to get everything done. The secret is three-fold.
First, all features are broken into small pieces that can be completed in a day or two. These iterations are also scheduled and monitored closely. Not only does the schedule hold only as much work as the developers estimate they can accomplish, but work is rescheduled if the schedule is too conservative or too liberal. Second, comprehensive programmer and customer tests help identify when features are truly finished. Finally, any new features that require data conversion (such as database schema changes) must be accompanied by migration utilities.
Since the software is always kept in a working, ready-to-release state, it's easy to release regularly. This helps keep migration risks low and the feedback loop between developers and users short. It pays to automate as much of the release process as possible, from smoke tests to packaging to installation tests.
Several projects have predictable release schedules, from Mozilla to Subversion to OpenBSD. Though there are no hard and fast deadlines and a project needs no financial backing (or even users) to survive, managing schedules and change wisely can only help a project survive.
XP gives the power to manage the schedule to the customer. That'd be scary, if it didn't also give the power to estimate development tasks to developers. A scheduling meeting generally has the developers saying, "We can do X hours of work in the next three weeks. Here is a list of tasks and the amount of time we estimate each will take. Please choose enough tasks to add up to X."
Few open source projects have the luxury of a single customer who can set development priorities. (It's nice to get one piece of feedback a month from a happy user, let alone a useful feature request!) That leaves the lead developers to wear the customer hat when appropriate.
XP customers decide what features to request based on actual business problems. The whole point of the software is to make their lives easier by helping them get work done. It may take some work to describe the needs of your project in those terms, especially if you're writing a game, but it's possible.
XP customers use stories to communicate feature requests to developers. A story is just a sentence or two describing the feature from the customer's point of view. Stories have to be short, concrete, and testable; they must fit into the normal release cycle and they should suggest customer tests.
Make your goals clear. Keep them small. They're easier to explain to other people and they're easier to schedule. If they're public, they're easier for other people to do for you; I occasionally look through projects I use to look for small, well-defined tasks to do in an afternoon.
Scheduling volunteers is hard. You don't know what they'll work on, unless they tell you. You also don't know how much time they'll have to contribute. Of course, you probably don't have financial pressures to release in a given quarter, though Perl 5, Python, and Ruby developers have all been scurrying to release the latest versions in time for integration in Panther (Mac OS X 10.3).
There's nothing like a code freeze to bring out latent brilliant ideas for potentially risky new features. If you wait for idea after idea to materialize, your release cycle can stretch out far longer than you anticipated.
If you build schedules around stories, you can adjust the amount of work scheduled for each release. It's okay to add features to a release as volunteers appear with patches and ideas, but, if you release once a month, it's easy to delay work on an idea for a couple of weeks until the start of a new release cycle.
If you're the customer, you have the authority to say when a release is ready. Rely on your customer tests; the first task of any story card should be to write customer tests. You may not ship these tests (but if you write them well, they'll be invaluable debugging aids), but they're a great way to keep track of your status.
Every story scheduled for a release needs customer tests. If the tests aren't even written, no one's worked on the story yet. If the tests are written and they fail, the story is started. If the tests are passing, the story is finished.
When all of the tests for the current stories are written and all of the tests in the system pass, make a release.
After a release, make a list of the next features you'd like to add. Write them as stories, from the point of view of the user. Give each story a rough time estimate and arrange the stories by priority, again, from the user's point of view. Then choose the two or three most important stories and schedule the next release based on their estimates. It may be a small release, but if you resist the temptation to add features without going through the scheduling process, it will be predictable. You may have to automate your release process, but that's a good thing!
Five Lessons You Should Learn from Extreme Programming -- Extreme Programming (XP) is yet another popular idea gaining press. It adapts the best ideas from the past decades of software development. Whether or not you adopt XP, it's worth considering what XP teaches. chromatic, author of Extreme Programming Pocket Guide, offers five lessons you should learn from Extreme Programming.
It's hard to adapt "traditional" software development processes to account for the realities of open source development. The twin goals of excellent software are the same, though: to write high-quality, maintainable software that meets the customer's real needs.
Several projects practice these lessons. Some have come about after XP was introduced. Some came about before. One good example of project managment is Subversion. They have good tests, they reuse lots of good code from other projects, and they have a stable, predictable release schedule.
As before, the best weapon in your arsenal is the knowledge and talent of your development team. Find the most pressing problem and solve it. If you're a typical open source project, you'll likely benefit from one or more of the above lessons.
chromatic manages Onyx Neon Press, an independent publisher.
O'Reilly & Associates recently released (July 2003) Extreme Programming Pocket Guide.
Sample Excerpt, Roles in Extreme Programming, is available free online.
For more information, or to order the book, click here.
Return to ONLamp.com.
Copyright © 2009 O'Reilly Media, Inc.