ONLamp.com    
 Published on ONLamp.com (http://www.onlamp.com/)
 See this if you're having trouble printing code examples


Rethinking Community Documentation

by Andy Oram
07/06/2006

A new era is halfway here, and nobody has recognized its impact--even though we've all participated eagerly in its arrival. The way we educate ourselves to use and program computers is shifting along many of the same historic lines as journalism, scientific publication, and other information-rich fields. Researchers have pounced on those other trends, but computer education remains short on commentary.

People say casually, "I find my information about using computers by searching online," but few have asked how that information gets online, or how it changes the way they use their computers. It's delightful to see thousands of ordinary people writing up guidelines to help each other; the outpouring of energy is impressive. In the week this article was written, when Microsoft--the most resource-rich software development center in the world--announced the MSDN wiki for community contributions, and eBay creates its own eBay wiki, you can tell the movement has taken root. Yet before it goes much further, the public should explore some key questions:

I care about community documentation for several reasons. First, as a heavy user of many computer technologies (with a focus on free and open source software), I benefit personally from amateur online sources. Second, I'm intrigued by the culture of mutual aid this movement reveals, and its meaning for the democratic sharing of information. Most imperatively, this movement cuts into my living as an editor of conventional documentation, for several reasons I desperately need to understand.

Community documentation has swept up so many talented people and accomplished so much that I've decided to join in and help it flourish. The topics I'll cover in this article are:

A Definition

I use the term community documentation for anything generated by developers and users (mostly amateurs in the field of writing) that helps people use their systems. The documentation could be as small as a question and an answer on a mailing list. As people answer more questions, some write up their answers in a more formal way and post them as web pages. Wikis are a further development. Zealous technology fans have written and released several complete how-to books.

Much of this documentation is:

Fast-changing
No published book could keep up with the evolution of software, particularly software distributed over the Internet. In fact, many important developments become old hat before a book or magazine publisher could even solicit and publish an article on them. This is why wikis are popular; they can tack and turn along with the software.
Interactive
Information emerges on mailing lists bit by bit as a newbie poses a problem, experienced users toss back suggestions or questions for further research, and the truth of the situation comes into focus. The process can be exciting to watch, like a puzzle everyone is competing to solve. Interactivity is not limited to discussion forums; even a relatively fixed piece of community documentation (such as an online tutorial) has probably undergone rounds of review and comment.
Free
While closed forums provide valuable and safe places for some kinds of discussions, this article covers the documentation on mailing lists and web sites that anyone in the world can view. This openness is central to the culture and ecology of community documentation, because when you recommend some reading to a correspondent, you don't want to worry whether he can't get access to (or can't afford) the document. That is sometimes the case with printed materials.
Ad-hoc
A posting to mailing list may concern a particular bug in a particular release of software, or a problem unique to one person setting up an unusual environment. However, not all community documentation is like this; extensive essays on stable topics also turn up.

These four criteria are not unique to information about computer technology. I described the same four criteria over four years ago in an article proposing new online media. I wrote that article as a bystander, thinking it referred to pop culture. I didn't think I would be anticipating the field I myself would have to work in.

Why the Shift to Community Documentation?

People have been asking friends for help ever since the discovery of language. In a more recent development, mailing lists started in the early days of computer networking. The Linux Documentation Project--completely volunteer-run--goes back to 1992, practically as far as Linux itself.

Community documentation has increased its hold on computer education over the last few years. Traditional book sales and training courses have stayed robust in some areas of technology, but a noticeable amount of learning has gone online.

The reasons for this change rest on several levels: technology, motivations, and social attitudes.

Technology

Technological changes affecting documentation include:

Improved search capabilities
Lots of good information (and bad information, too) is buried in three-month-old archives or other obscure sites; modern search engines can turn up quite a bit of it.
Improved network connections
When you're always online, the temptation is overwhelming to slide over to that little Google window in your browser and type in a few words. I did it a dozen times while writing this article.
Improved authoring tools
The inventions of the blog and the wiki were modest technological tweaks to web technology, but had a huge qualitative effect in bringing down barriers to grassroots communication.
Improved research opportunities
Free software opens up every aspect of a system to examination by anyone with modest computing skills. Even proprietary systems and systems involving hardware come with more information today; vendor attitudes are also more open to sharing ideas.

Technological phenomena create an environment where community documentation can flourish, but each person must have an individual reason for contributing to the documentation.

Motivations

The motivations for doing community documentation are well worth some formal research. This research would parallel the extensive research into why free software developers give away their work. For free software, there are several trends. Often, the software developer is simply sharing what he needed to write for purposes of his own. As Tim O'Reilly has pointed out, the situation is not so simple for free documentation: authors don't benefit from it directly, but must have reasons to generate it for other people.

Anecdotal evidence suggests reasons people write community documentation:

Informal support
Many developers or other members of software projects go on the online forums to offer technical support, which in turn promotes use of the software.
Mutual aid
People help others with the expectation they themselves will get help when they need it.
Gratitude
Closely related to mutual aid, people say, "I received help when I was new to this technology, so I want to help others."
Reputation building
Consultants, trainers, job-seekers, authors, and others hoping to build a career go online hoping for recognition and respect.
Personal growth
By offering advice to others and by tracking them through the process of repairing a problem, advisers build their own diagnostic and communication skills.
Payment
Mixed in among the free documentation are sites with advertising. Some of these sites draw enough traffic to make a modest but noteworthy income for their authors.
Thrills
There's a wholesome pleasure in seeing your insights turn up almost instantly on a forum with worldwide scope, as well as watching others succeed with your help and praise you for it.

We should look past these immediate motivations. Something much bigger is going on, and it's particularly important to traditional publishing--a change in social attitudes.

Social Attitudes

It's not surprising that authors face a choice between participating on the Internet and writing for traditional books and magazines. The most successful authors find time for all of these media. When they do choose between them, books increasingly lose to the competition. Something about books suggests a bygone era.

Perhaps in this age of wikis and instant communication through online chat, people don't want to wait nine months to write and release a book. Yet even that doesn't explain the shift. More fundamentally, in our day and age, connections with other people have taken precedence over book publishing.

Taking nine months to write and publish a book is more than information sharing; it's self-expression. Nowadays, this degree of self-expression can seem like self-indulgence.

Our sense of individual identity urges us to self-expression. Back in the 1950s and 1960s, the search for identity became a central concern for millions of people. Social commentaries of all sorts quoted such psychoanalysts as Erik Erikson, who wrote books with the titles Identity: Youth and Crisis and Identity and the Life Cycle, and who defined the early stages of life as the search for individual identity. (Erikson said, however, that one must move beyond one's "identity"--the collection of one's social traits--in order to truly be oneself.)

Now we live in a world of over six billion people, heading toward nine billion. In such a world, are you really so important? The search for an individual identity just doesn't seem to make that much of a difference.

Meanwhile, the Internet connects us with millions of people, offering instant gratification when we share an idea quickly. Reciprocal interaction and the co-development of ideas, in my opinion, give a lot of people more satisfaction now than the self-indulgence that comes from writing a book.

Regardless of whether you accept my sociological explanation of the move to community documentation, consider the problems that it brings.

Problems with Community Documentation

Community documentation is indispensable, and it makes the difference for many people between abandoning a system in frustration and engaging in productive work. Unfortunately, community documentation isn't everything computer users need, or even everything it could be. I divide my critique into a few areas:

I follow up with examples of community documentation I like, to show that my goals are achievable. Parts of the following sections build on an earlier article of mine, "Splitting Books Open: Trends in Traditional and Online Technical Documentation."

Failures of Interactive Help

Some interactions on mailing lists are wonderfully effective. A struggling user can learn of a bug fix to download, a document for background reading, or a subtle typo in her input. Everybody goes away happy and gets back to business.

These interactions can also turn into a crutch. Modern computer users need to develop mental models to handle new situations that come along, and quick fixes can prevent that from happening. In fact, interactive help can retard learning.

System administration is particularly at risk. Imagine someone trying to connect a Linux system to a Windows server that has files the Linux user wants to access. The system administrator has has configured the Linux system to grant access by treating the remote folder as a Linux filesystem, but she gets only an error message saying the filesystem type is incorrect.

The documentation indicates that the requested type is indeed correct, so now the system administrator has to take the error message as grist for further study. If she has the background knowledge to step through some tests, she will soon realize that the Linux system does not have support for the filesystem type, but needs a special module to be loaded. For various reasons, this module is not present by default (such as the desire to conserve space and some legal uncertainty about Microsoft's tolerance for this kind of Windows-compatible software).

A better error message might have been "No support for filesystem type" or "No module found for this filesystem," but software itself is limited in diagnostic capabilities and knowledge of the environment. It's really up to a system administrator to keep hold of the big picture.

By stepping through a diagnostic inquiry, the system administrator can learn something about how Linux is put together, something about how filesystems work, and something about licensing controversies.

Now suppose she had just reported her error message to a mailing list? A helpful user would probably have told her to load the right module, with or without further background.

This is "give a man a fish" behavior. The recipient of the information will be better off the next time she faces the exact same problem, but has lost an opportunity to practice skills she needs as a system administrator.

Where do computers users most often miss out? Some types of knowledge may be amenable to learning in dribs and drabs. Certain other subjects are deep and require a holistic approach. These include:

Security
Security doesn't consist of installing all the patches from the vendor. It consists of an integrated approach to policies, risk assessment, and the disciplined monitoring of systems.
Performance tuning
Few optimizations can be made in isolation. The term "tuning" is quite appropriate here, because performance is like trying to tune a keyboard instrument. On the keyboard, it's easy to tune two strings to a perfect fifth. Yet if you go around the strings tuning them all to perfect fifths, you'll never be in tune. Tuning takes a sophisticated, nuanced approach--and different tunings are appropriate for different time periods and pieces.
Troubleshooting
As my example of Linux/Windows filesharing suggested earlier, you need a broad understanding of many different levels of a system to do problem-solving.
Robust programming
In any programming language, bad habits are easy to fall into, and they come back to torment you later.

With this longer view in mind, it's worrisome that a lot of advice given on mailing lists is unsystematic. Not, "This is why your system is failing, and here's how to fix it," but "Gee, I had a similar problem a year ago, and when I did the following it worked." The generous donor may, unfortunately, be setting up the recipient of the advice for future failure.

Interactive help can also be highly inefficient. Often I have seen someone out of his depth returning to a mailing list over and over for advice doled out in inadequate quantities by list members. Pointers to more complete documentation are hard to offer, because they tend to sound like a dismissive and insulting "Read the eff-ing manual." Even so, community documentation might not meet the user's needs in any case.

Failures of Writing

My 13 years as editor have shown innumerable times how difficult it is to write effective explanations of technical topics. Most would-be authors need intensive mentoring to write at a level the reader can understand, and people doing community documentation usually lack the time to even try.

Everyone suffers from a bad turn of phrase now or then, or forgets to define a term before using it. These problems--along with grammar and spelling errors--are fixable by copy editors, who are readily available on a freelance basis. Many readers can also help resolve these lapses. I consider this an easy problem, and won't discuss it any further.

What interests me are the conceptual problems that copy editors cannot find or fix. These lie with the intended use of the document, not its readability. Such problems lead to the common complaints "This was too abstract," or "What am I supposed to learn from this?" Regardless of the endless manifestations of these sins, I think that most fall into two categories: approaching an explanation at the wrong level and drowning the reader in details.

Approaching an Explanation at the Wrong Level

Many authors have learned the system at one level--particularly if they are developers--and cannot move mentally to the level of the reader. Take a trivial case as an example: configuring an automatic login.

Suppose an author documents the purpose of automatic logins with:

If you configure your system for an automatic login, your initial screen will come up as soon as the computer starts, without any request for user name and password.

That text is perfectly understandable to most computer users, and would likely pass muster during a copy edit. It might even look useful to someone evaluating the document.

However, the passage is useless, because it is tautological. The term "automatic login" already conveys everything the passage says. The text adds nothing to the reader's comprehension.

A useful discussion of automatic logins would deal with the fundamental purpose of the user name and password prompts: security. The discussion must focus on one key point: by configuring automatic login, you allow anyone with physical access to your system to log in as you. A secondary point is that the feature is predicated on the assumption you are the only user who will want to use the system.

In some situations, this approach is reasonable. For a laptop, for instance, you may be the sole user, and if it is stolen (as we know all too well from recently publicized incidents) the data is readily available even to someone who can't log in. The recommended security approach for a laptop is an encrypted filesystem, and requiring a user name and password is just an annoyance that makes extra work for the legitimate user.

A lot of authors have a hard time investigating which aspects of technology interest its users. I consider the problem to lie with differing levels.

Useful documentation normally starts at a very high level, with goals of the readers, and then descends into the system operations that meet those goals. Because the hardware and software reflect the lower levels more directly, and because a knowledgeable author is comfortable at those levels, the higher levels are the hardest parts of the document to write. Yet these are the introductory paragraphs that the reader should see first! It's amazing that readers persist as often as they do to read the documents, despite incomprehensible or missing introductions.

Drowning the Reader in Details

Related to the previous problem is the proliferation of guides that jump into details too quickly. Open most technical magazines or books for computer users, and you find lists of tasks: "load a disk ... build some software on it ... enter a command to make the operating system recognize it ... give it a name ...."

Supposedly, following the directions to a T gives you a functioning system. Usually, your environment differs in some subtle way from the author's environment, so your attempt to following directions fails. Even if you do get the system working today, it may fail tomorrow when you log back in.

Will background documents help? Finding them may be hard. Just because a background document describes a system doesn't mean you can make use of that explanation. There may be a crucial link that you can't make between the background you're reading and the particular task you're trying to solve.

One software project I described in a recent blog post about a new game development platform called Volity contains extensive, well-written documentation. Like most projects, Volity depends heavily on software developed elsewhere. Its documentation properly points to background documentation for each of those systems. The background for those systems does not, obviously, explain the relation between Volity and those other systems. The Volity website fills that gap with several web pages. All software projects call for that level of investment in documentation.

Failures of Organization

One nice thing about books is their linear arrangement. However, many people say they don't read technical books in order. Our reading technology implicitly embeds this behavior; if we always read things in order, we'd have scrolls instead of books.

When I'm in the middle of getting something to work on my system, I notice an interesting pattern in how I read a book. I think other people do something similar.

I usually skim the book all the way through to get a sense of how things fit together, but when I want to accomplish some task, I just flip to an example or step-by-step procedure (yes, the kind I ridiculed in the previous section) and try it out, with any changes that seem appropriate.

If the procedure works, I'll congratulate myself as a clever fellow, without feeling guilty that I haven't invested in a deep investigation of the system. If the procedure doesn't work, I'll look back at nearby descriptions and try to find out what was missing in my rendition of the procedure. If that doesn't work, I may return to more basic material, perhaps in a different chapter. Essentially, I read the book backward.

Backward reading is an ineffective way to process material the first time you see it, but it may be very effective when you're applying the material to concrete tasks. In fact, such a learning style may be mandatory. Skim some material, try to apply it, struggle a bit, and read the relevant sections again.

Perhaps the step-by-step documentation could work after all, if it had background links. People could write step-by-step documentation and background material at different times. To tie it all together, the Web provides hypertext. Such a system is a flexible way of organization for documentation that allows authors to write in their spare time, and the resulting chunks are closer in size to what readers like to read at a single sitting.

The thread through the various links can become pretty tangled, though. Over time, more and more documents on each topic build up. Because the price of disk space has fallen so much, nobody feels an urgency to remove outdated documents. It's hard to tell what you really need, and hard to know whether a document will help you.

Nature Magazine sparked intense debate in December 2005 with its study claiming that Wikipedia is almost as accurate as the iconic Encyclopedia Britannica. (Another way to summarize the claim is that the hyper-professional Encyclopedia Britannica contains almost as high a rate of errors as the freely donated work in Wikipedia.) Expert opinion differs on the accuracy of the study, but one claim in it has largely escaped discussion: Nature magazine found that Encyclopedia Britannica articles were better organized and easier to find information in.

This suggests that, beyond the question of accuracy, the value of documentation increases with a concentrated, consistent authorial voice--and some editing wouldn't hurt either. Technical documents on the Web tend to be the product of a single author, more than Wikipedia articles, but the collections of multiple documents as a whole tend to reflect problems of organization, chosen audience, and tone.

Community Documentation I Like

What I'm asking for is not impossible. Occasionally I come across a superb example of documentation produced for the community. In addition to the Volity site mentioned previously, two examples are the NFS How-To and the Linux Sound How-To, both from the Linux Documentation Project.

NFS (Network File System) is one of the earliest ways to share files between computer systems in the manner I mentioned earlier. The Linux NFS How-To has had no revision in four years, but this is no drawback because NFS is a stable system (some people would call it a legacy system). The authors did it right the first time. The how-to compares favorably to commercial documentation on the topic.

The languages is quite professional throughout, while remaining conversational and easy to scan for the information you want. Sections have logical organization, and titles bring you to the information appropriate for your setup.

The document's introductory sections lay out the goals of the paper, the software needed to get NFS working, and the knowledge the reader is expected to bring to the project; there is also a reference to another how-to.

Numerous warnings about special cases show that the authors (probably, I'm guessing, with input from readers) have an intimate grasp of real-life use. Security, performance, and troubleshooting--the areas of system administration I indicate as requiring a holistic and deep understanding--have their own in-depth sections, with hints scattered throughout the document.

The Linux Sound How-To is even older than the NFS How-To, but remains a model for how to deliver complicated information in an easily digestible form. The author, Jeff Tranter, has provided me with material on related subjects for O'Reilly books, and he brings his top-notch professionalism and care to this free document as well.

Several aspects of computer sound systems make documentation difficult. First, they require the cooperation of software at many different levels, from the kernel and device drivers up to the particular utilities invoked by users. Second, some Linux installations do more for the user than others. On one installation, everything may be in place; in another, the user has to carry out a lot of manual installation.

A wide range of activities are related to sound (from playing CDs and files in various formats to doing professional sound editing); in this how-to Tranter just explains how to get audio working. Starting with the sound card and moving up through the layers of the system, he helps the reader find out when the Linux system has done the setup work and how to repair the situation when it hasn't. There is a long trouble-shooting section in the form of a Frequently Asked Questions list.

The Continuing Role of Conventional Books

Until online documentation (whether free or proprietary) can consistently provide a professional and well-organized learning experience, printed books will remain important. This remains true even though many books fall short of providing a professional and well-organized learning experience. A few readers, dazzled by the possibilities of new media, might greet this claim dismissively, saying "The author has to say that, because he edits books." In fact, the reverse is the case: I edit books because they still play an important role in learning.

The book remains the medium in which an author is most likely to take time to consider a topic from many angles and at many levels. It is the medium where professionals in many disciplines can deliver an effective message. People recognize that.

Recognizing the unique contributions books make, online forums should promote books just as they promote online documentation. Some people, as I've said already, have trouble getting hold of the printed books, but the majority of online visitors can benefit from them. This is an age of grassroots marketing; the buzz generated by everyday computer users has a much greater impact on book sales than conventional advertising.

Some online communities feel comfortable talking about the books they like and don't like. These communities generate more book sales, and ultimately allow publishers to put more resources into those communities: more books, more journals and websites, and even more sponsorships for conferences and other activities.

Economically, of course, such investments represent a transfer of money from individual book buyers to community-based activities, with the publishers as mediators. My point is that the community benefits a lot more from book sales than most participants realize.

Other forums, for reasons I don't understand, show a reluctance to recommend a book, even when a highly respected publication exists that directly addresses the confusion and misunderstandings expressed by participants on the forum. By suppressing sales, this reluctance makes it very hard to publish further books in that space.

In short, learning is an ecosystem that flourishes best when all resources and contributors have recognition. Free online resources may someday provide everything that is needed--but communities have to do some work to reach that point.

Potential Improvements to Community Documentation

Community documentation can do a lot better. There is potential for:

Online Training--an Antidote to the Mailing List Crutch

Graduate schools throw case studies at their students to toughen them up for the outside world. The Army toughens its recruits through very realistic simulations (computerized as well as on the ground). In the computer field, online training programs put students through realistic problem-solving steps.

These are all expensive and require expertise to develop. To ask software communities to do the same is a tall order, but they might be able to make headway if they had streamlined development tools and modest expectations.

Funding and Rating Systems--a Fix for Writing Problems

Learning to write for the reader takes practice and guidance. It best occurs under the tutelage of an experienced author. Given the excellent energy going into community documentation, its flexibility, and its universal availability, the entire computer field would benefit if some professional resources were shifted from books to the efforts of amateurs online.

What I'm suggesting is very different from the common phenomenon (over the past decade) of conventional publishers printing and selling community documentation. It's nice if somebody's free book, written as a gift to his fellow software users, can earn a buck or two in print. It's also rare for community documentation to profitably make the transition, because--as I explained earlier--it's fast-changing, interactive, free, and ad-hoc.

Two forms of investment can improve the writing of community documentation, and both require funding. First, writers need to take extra time to sit with their material and figure out what's really relevant to readers. Second, writers need to work with editors who possess a deep knowledge of the computer field and how to write for it.

Finding money for authors and editors (as well as artists, website administrators, and others involved in the production and distribution of community documentation) could lead to even more of the inspired efforts we've seen in communities over the past several years. Naturally, an injection of money could also corrupt the system, with undeserving authors and editors trying to direct funds their way. We have to enter this area with caution, but doing so can reap vast rewards for readers.

Money could come from multiple sources: the projects that develop computer systems, the vendors who sell them, and even communities of users pooling their contributions. It seems almost pie-in-the-sky right now to anticipate that people will care so much about documentation, but the money might start to flow if the investment made a demonstrable difference.

Thus, we need a rating system to show that good writing and professional editing actually produce more educated and successful computer users. The rating system has three main uses:

In short, readers will make use of rating systems to decide what to read, and projects will use them to decide what to feature on their sites. As we'll see, ratings could be incorporated into search results to let the more effective documents rise to the top.

It's important to find a rating system with some validity. Neither the experts nor the ordinary users can be entrusted with this task.

Problems with Expert Evaluators

It may seem natural to ask project leaders, or other experts on a topic, to read and rate documents. Such experts will probably make their own judgments anyway about who's a good author, and will base informal decisions on these judgments, such as whom to invite to write new documents. As a rating system, this approach suffers from several problems:

Let's leave our experts to do what they do best--exercise their expertise--and find other ways to rate documents.

Problems with Formal Rating Systems

There is enough research on reputation and rating systems to design a proposal for a documentation rating system. I'll do one as a thought experiment here, just so people don't have to ask, "What about the formal research in reputation?"

The goal is to produce a measurable rating (such as a number on a scale from zero to ten) for each document, based on user ratings. We must try to prevent a user from rating a document more than once, because uncooperative users will want to submit a hundred ratings for their own document, or for ones written by their friends.

The first step in a rating system is user registration. This is considered a proof of work in rating systems. Registration should be hard enough that someone won't try to do it a hundred times. For instance, you can register someone who buys a computer system. If software is downloaded, one can present a series of questions that must be answered to register. While someone who is sufficiently determined to game a system can do so, there are ways to feel confident that most users have registered only once.

The second step is then to present a scale to users that allow them to rate documents, after logging in. Results can be tabulated to show which documents people like. You could even set up collaborative filtering, so that you can find other documents that were rated highly by people who tend to share your taste.

The most obvious problem with this system is its crushing weight. Nobody (except me) cares enough about documentation to erect or go through such a cumbersome process.

Even worse, this system suffers from several failings. It fails the principle of minimal disclosure, because we are asking users for much more information than we need. We've asked each user to associate himself with a rating, when we don't really care which user liked which document. (We might care if collaborative filtering can be useful, but that's a separate feature that users might not opt in to.) All we want to know is how many users collectively benefited from each document.

This rating system doesn't actually tell us what we want to know. We don't know whether someone read the document before rating it, and the rating is just a popularity contest--it shows whether the reader had a pleasant experience, but not whether the reader can take the information back to work and be productive with it.

If a reader has a distinct, one-time flaw to repair, she can tell whether a document has helped her repair it, and can indicate that with a simple click on a yes/no box. Such ratings can be useful in these limited situations. For anything more conceptual or long-range, user ratings are of uncertain reliability.

Rating Through Quizzes

Finally, I arrive at the system I'd like to use to prove to project leaders and users that professional help can make better documentation.

The goal is to associate each document with a short quiz--say, two or three questions--that determines whether the reader has learned what he needs from the document.

Choosing questions is a delicate task. When testing whether someone can set up a network, for instance, it's not particularly helpful to ask, "What is a Class C network mask?" It would be more pertinent to real-life concerns to ask, "What network mask would someone setting up a home or small office network use?" These questions should be fairly difficult, to minimize the chances that readers already knew the answers before they read the document.

Probably, the choice of questions should not be up to the author of each document. Ultimately, the choice is in the hands of the people who put up the document or fund the system. They may choose to turn it over to subject-matter experts.

From the reader's point of view, the system is low on overhead and investment of time. It looks like:

  1. The reader clicks on a link to access the first page of the document.
  2. The reader can then access pages in any order, and reread pages. The only problem the reader encounters is if he takes more than a few days to read the document. Because URLs are dynamically generated, the document may disappear from the website if the reader tries to return to a page after a time lapse.
  3. On the last page, the reader is asked to test her knowledge. She clicks through two or three pages, answering one question on each. She can return to the document while taking the test. (I believe quizzes should be "open book," to use a metaphor from the classroom. Quiz results should depend on deep comprehension of concepts, not mere memorization.)
  4. The reader is told at the end how well she fared.

On the server side, the system works like:

  1. Each page of the document, and each page of the quiz, is dynamically generated and associated with a random character string on the server. Doing this puts some barriers in the way of somebody who wants to undermine the system by submitting hundreds of quiz results. The server maintains information on the reader's session for a few days, giving the reader plenty of time to access all pages and the quiz.
  2. The reader's answers are stored permanently together with the random character string, which functions as a primary key with which to retrieve results. No information on the user is kept, however.
  3. Results can be retrieved and manipulated for any research purpose: to determine, for instance, how well the document fulfilled its purpose, or whether recent changes to document improved it.

While standard practice might be to use a CAPTCHA ("Completely Automated Public Turing test to tell Computers and Humans Apart") system to stop automated programs that corrupt the rating system by submitting multiple quiz results, I think such a system would impose too much overhead for readers. It's hard enough to get feedback from users without putting them through the hassle of filling in a field with gibberish.

A more fruitful way to suppress ballot-stuffing might be to limit the number of submissions from a single IP address, the number of submissions during a certain time interval, or some combination of the two. Rate-limiting (no pun intended) might place a limit on the number of legitimate quizzed submitted by users behind a firewall, but that could be a reasonable price to pay to fight fraud. If you know you're soliciting a huge number of responses from one site--for instance, from students at a college--just put your site on their side of the firewall.

Search Facilities--a Fix to the Organizational Mess

I'll finish with a few ideas about helping people find their way through the wealth of documentation. On some projects it is like a tropical rainforest, overflowing with abundance but easy to get lost or drowned in.

Every author has to consider what he expects his readers to know before starting his document. Well-organized authors list the topics they expect the readers to know, and a few point to related documents. A list of such prerequisites, as well as suggestions for further reading, would enhance most documents.

In a rich environment such as the Web, an author should not have the burden of guiding readers through the forest. Other sites can sprout up freely to offer the personal choices of different readers. People can offer portals or catalogs that guide readers with particular needs to particular sequences of documents. This development will be particularly important for the spread of documentation in less commonly spoken languages. The challenge is to match each new reader with the selections most relevant and congenial to him.

Google's famous link-count algorithm has shown that the number of citations is a useful guide to the value of a site. Yet just because a document was right for many other people doesn't mean it's right for your particular needs. Maybe lots of other people use Windows and you're on a Mac or a Linux box. Maybe they code in PHP but you use Python.

The previous section introduced another rating system that can help raise the profiles of the best sites; each project can publish links to the documents with the most successful quiz outcomes.

Selecting the most useful document for each reader may be a fuzzy activity for a long time. Some combination of personal recommendations, search engine results, and rating systems will help readers find the information they need.

Finally, a complex and potentially powerful relationship exists between the two main types of documentation I've discussed in this article: spontaneous interaction on mailing lists and more formal documentation. It would be worth exploring how to make this link more useful.

For instance, imagine that advisers on mailing lists would not only point novices to useful documents, but offer themselves as guides. Not "read the eff-ing manual," but "I know a document that explains what you want, and I will interact with you while you read it." That could open a whole new era in literacy--not just what teachers used to call computer literacy, but the ability to glean valuable information from a fixed text. That, I believe, is a perennial goal of education.

Andy Oram is an editor for O'Reilly Media, specializing in Linux and free software books, and a member of Computer Professionals for Social Responsibility. His web site is www.praxagora.com/andyo.


Return to ONLamp.com.

Copyright © 2009 O'Reilly Media, Inc.