ONLamp.com
oreilly.comSafari Books Online.Conferences.

advertisement


Extreme System Administration

by Andrew Cowie
03/31/2005

Learning from Programmers

I know, it sounds like a horrible thought. Most sysadmins I know look upon programmers as anathema. The world of system administration is different, you might say?

Programmers have to work together in teams. They have to communicate effectively with each other to do so. They collaborate on a common code base. While they make extensive use of tools that help them, they ultimately cannot avoid the necessity of interacting to solve problems, one human to another. Sysadmins sometimes miss this point.

Extreme Programming

Once it might have been presumptuous to say so, but today, most software development projects are late, have cost overruns, fail to meet expectations, and are bug ridden. If you're very lucky, the product might work when delivered.

In the last five years, a set of software development processes have arisen that somewhat dramatically challenge the status quo. Certainly the explosive growth in popularity and success of open source software, developed as it is in a distributed collaborative setting, is one. Another trend is Agile Development, schools of thought that advocate unconventional practices, the most radical of which goes by the moniker Extreme Programming.

I came across the following list in a book by Bruce Tate about the pitfalls found in Java software development (appropriately titled Bitter Java). I often reflect on how we--the sysadmins, DBAs, and network engineers who make IT systems run--can improve our work, and I quickly realized that some of the practices from "XP" could very well apply to the operations world:

  • Choose simple solutions.

    "Simple solutions are less likely to create antipatterns, or to mask those which already exist."1

    How often do we dream up horribly complex solutions and then gripe at users who are constantly complaining, puzzle over why our idea turned out to be difficult to maintain, and wonder why we have no time to work on new projects?

  • Ensure that the customers are on site.

    "End users provide guidance, insight, and opinions."

    Customers? Who? If we create processes that people inside our company have to follow in order to get what they need from our systems, then they are our customers. They are the ones who actually have to use the system, so involve them early.

  • Related Reading

    Extreme Programming Pocket Guide
    By chromatic

  • Write user stories.

    Jot down, in just a few sentences, what the user experience will be for interacting with your group, your systems, or some process interface you are creating. Share this with people--your team, your boss, and above all the people who will use it. If they get it, and if they think it's OK, then you're on the right track. If they don't, then listen to them.

  • Divide larger projects into measured, planned, small releases.

    "Smaller cycles result in user feedback, diminished risk, and adaptability to change."

    The temptation is to often accumulate a list of needed changes and then attempt to execute all of them in one single composite event. Pressures to minimize the risk of downtime exacerbate this. Unfortunately this drives up the complexity of such events, making it harder to isolate problems. If you can keep events discrete, and have good telemetry from your systems, then you can troubleshoot more effectively.

  • Program in pairs.

    "This practice seems wasteful but has enormous power. Pair programming improves quality and reduces tunnel vision. Antipatterns are more likely to be spotted with an extra set of eyes."

    In systems work, I've seen this dramatically increase productivity and reduce the kinds of small human errors that can wreak so much havoc on computer platforms. It gives a person working on a critical system someone to bounce ideas off and verify the sensibility of actions. Two heads are better than one, because you get different perspectives, ideas, and experiences looking at a problem. Above all, it means that there is more than one person who knows what changes were made and why.

  • Code test cases before the rest of the system.

    This is a neat one. Conventional software development normally writes the code, then creates the tests, then performs QA to see if the code works. Agile Development advocates reversing this process: write the tests first. It takes a bit of acclimation, but if you work out what the interfaces to the systems should be, and how the outputs should behave, then you are actually well placed to concentrate on developing the system to satisfy these requirements. Indeed, there is a school of thought that goes so far as to say that when all the unit tests pass, then by definition the code is done.

    I think we can learn a couple of things from this. Certainly one of the biggest problems in operations is simply knowing when something has broken. If we can establish effective monitoring, telemetry, and alarm event notification systems from the outset, and make maintaining and updating those systems a rigorous part of our change management, then we are a lot more likely to know when a casual, supposedly unrelated change causes a problem. This doesn't have to be expensive--there are some excellent open source projects that do the job really well.

    Of course, if you can predict in advance that things are going to fail, then you're really cooking. Integrate QA's staging and testing systems with the production-monitoring defenses, and you can get there.

  • Do not use overtime.

    "Overtime increases available hours and reduces clear thinking--with predictable results."

    Any questions?

    Seriously: take care of yourself and your people. Don't burn the midnight oil just because you think it's the macho thing to do. If you're tired, you're more likely to make mistakes. People need a mental break from work--especially from high-pressure environments as found in the IT world. At the end of the day, if you continually drive yourself or your people into overtime, you will lose the extra productivity through fatigue and reduced effectiveness.

Nothing New About Professionalism

None of these ideas are particularly new; in fact, the discipline of programming has been well established for more than 30 years. Part of the problem is that many of us have either forgotten what we learned in school (brushed aside by day-to-day work pressures), or we picked up programming or systems on our own and never had an opportunity to study and learn from the early pioneers and masters of our field.2 Continually striving to learn new things not just in our own narrow specialties, but across a broad range of disciplines, is ultimately one of the best tools we have to prepare ourselves for change.

Learn more by attending Andrew's presentation, entitled "Surviving Change: Bringing Operations Professionalism to the IT World," at the MySQL User Conference, April 18-21 in Santa Clara, California.

Notes

1 Bullet items and quoted text from Bruce Tate, Bitter Java (Manning, Greenwich, Connecticut, 2001), p. 49. These are, in turn, a paraphrasing of the material about Extreme Programming available from Don Wells, et al: Extreme Programming: A Gentle Introduction, available at www.extremeprogramming.org

2 Compliments to Steve Landers of Digital Smarties Pty. Ltd., Western Australia, who first pointed this out to me. He recommends László Böszörményi, Jürg Gutknecht, and Gustav Pomberger, The School of Niklaus Wirth: The Art of Simplicity (Morgan-Kaufmann, 2000), a recent book of essays on the topic.

Andrew Cowie runs Operational Dynamics, an operations and infrastructure engineering consultancy.


Return to ONLamp.com.



Sponsored by: