oreilly.comSafari Books Online.Conferences.


Rethinking Community Documentation
Pages: 1, 2, 3

The Continuing Role of Conventional Books

Until online documentation (whether free or proprietary) can consistently provide a professional and well-organized learning experience, printed books will remain important. This remains true even though many books fall short of providing a professional and well-organized learning experience. A few readers, dazzled by the possibilities of new media, might greet this claim dismissively, saying "The author has to say that, because he edits books." In fact, the reverse is the case: I edit books because they still play an important role in learning.

The book remains the medium in which an author is most likely to take time to consider a topic from many angles and at many levels. It is the medium where professionals in many disciplines can deliver an effective message. People recognize that.

Recognizing the unique contributions books make, online forums should promote books just as they promote online documentation. Some people, as I've said already, have trouble getting hold of the printed books, but the majority of online visitors can benefit from them. This is an age of grassroots marketing; the buzz generated by everyday computer users has a much greater impact on book sales than conventional advertising.

Some online communities feel comfortable talking about the books they like and don't like. These communities generate more book sales, and ultimately allow publishers to put more resources into those communities: more books, more journals and websites, and even more sponsorships for conferences and other activities.

Economically, of course, such investments represent a transfer of money from individual book buyers to community-based activities, with the publishers as mediators. My point is that the community benefits a lot more from book sales than most participants realize.

Other forums, for reasons I don't understand, show a reluctance to recommend a book, even when a highly respected publication exists that directly addresses the confusion and misunderstandings expressed by participants on the forum. By suppressing sales, this reluctance makes it very hard to publish further books in that space.

In short, learning is an ecosystem that flourishes best when all resources and contributors have recognition. Free online resources may someday provide everything that is needed--but communities have to do some work to reach that point.

Potential Improvements to Community Documentation

Community documentation can do a lot better. There is potential for:

Online Training--an Antidote to the Mailing List Crutch

Graduate schools throw case studies at their students to toughen them up for the outside world. The Army toughens its recruits through very realistic simulations (computerized as well as on the ground). In the computer field, online training programs put students through realistic problem-solving steps.

These are all expensive and require expertise to develop. To ask software communities to do the same is a tall order, but they might be able to make headway if they had streamlined development tools and modest expectations.

Funding and Rating Systems--a Fix for Writing Problems

Learning to write for the reader takes practice and guidance. It best occurs under the tutelage of an experienced author. Given the excellent energy going into community documentation, its flexibility, and its universal availability, the entire computer field would benefit if some professional resources were shifted from books to the efforts of amateurs online.

What I'm suggesting is very different from the common phenomenon (over the past decade) of conventional publishers printing and selling community documentation. It's nice if somebody's free book, written as a gift to his fellow software users, can earn a buck or two in print. It's also rare for community documentation to profitably make the transition, because--as I explained earlier--it's fast-changing, interactive, free, and ad-hoc.

Two forms of investment can improve the writing of community documentation, and both require funding. First, writers need to take extra time to sit with their material and figure out what's really relevant to readers. Second, writers need to work with editors who possess a deep knowledge of the computer field and how to write for it.

Finding money for authors and editors (as well as artists, website administrators, and others involved in the production and distribution of community documentation) could lead to even more of the inspired efforts we've seen in communities over the past several years. Naturally, an injection of money could also corrupt the system, with undeserving authors and editors trying to direct funds their way. We have to enter this area with caution, but doing so can reap vast rewards for readers.

Money could come from multiple sources: the projects that develop computer systems, the vendors who sell them, and even communities of users pooling their contributions. It seems almost pie-in-the-sky right now to anticipate that people will care so much about documentation, but the money might start to flow if the investment made a demonstrable difference.

Thus, we need a rating system to show that good writing and professional editing actually produce more educated and successful computer users. The rating system has three main uses:

  • To choose which documents are distributed and recommended.
  • To determine which authors and editors get paid for their contributions.
  • To justify the whole effort of producing better documentation.

In short, readers will make use of rating systems to decide what to read, and projects will use them to decide what to feature on their sites. As we'll see, ratings could be incorporated into search results to let the more effective documents rise to the top.

It's important to find a rating system with some validity. Neither the experts nor the ordinary users can be entrusted with this task.

Problems with Expert Evaluators

It may seem natural to ask project leaders, or other experts on a topic, to read and rate documents. Such experts will probably make their own judgments anyway about who's a good author, and will base informal decisions on these judgments, such as whom to invite to write new documents. As a rating system, this approach suffers from several problems:

  • Who has time to read all the relevant documents? And how can one expect the experts to rate them along some consistent scale?
  • Project leaders and experts likely have conflicts of interest. They may well write documents themselves, or have close friendships with people who write them.
  • Most fundamentally, experts look at documents differently from the people who need to learn the material. For instance, an expert often likes a document because it delivers an accurate formal description of a system, yet such a document might be of little use to someone trying to learn the system.

Let's leave our experts to do what they do best--exercise their expertise--and find other ways to rate documents.

Problems with Formal Rating Systems

There is enough research on reputation and rating systems to design a proposal for a documentation rating system. I'll do one as a thought experiment here, just so people don't have to ask, "What about the formal research in reputation?"

The goal is to produce a measurable rating (such as a number on a scale from zero to ten) for each document, based on user ratings. We must try to prevent a user from rating a document more than once, because uncooperative users will want to submit a hundred ratings for their own document, or for ones written by their friends.

The first step in a rating system is user registration. This is considered a proof of work in rating systems. Registration should be hard enough that someone won't try to do it a hundred times. For instance, you can register someone who buys a computer system. If software is downloaded, one can present a series of questions that must be answered to register. While someone who is sufficiently determined to game a system can do so, there are ways to feel confident that most users have registered only once.

The second step is then to present a scale to users that allow them to rate documents, after logging in. Results can be tabulated to show which documents people like. You could even set up collaborative filtering, so that you can find other documents that were rated highly by people who tend to share your taste.

The most obvious problem with this system is its crushing weight. Nobody (except me) cares enough about documentation to erect or go through such a cumbersome process.

Even worse, this system suffers from several failings. It fails the principle of minimal disclosure, because we are asking users for much more information than we need. We've asked each user to associate himself with a rating, when we don't really care which user liked which document. (We might care if collaborative filtering can be useful, but that's a separate feature that users might not opt in to.) All we want to know is how many users collectively benefited from each document.

This rating system doesn't actually tell us what we want to know. We don't know whether someone read the document before rating it, and the rating is just a popularity contest--it shows whether the reader had a pleasant experience, but not whether the reader can take the information back to work and be productive with it.

If a reader has a distinct, one-time flaw to repair, she can tell whether a document has helped her repair it, and can indicate that with a simple click on a yes/no box. Such ratings can be useful in these limited situations. For anything more conceptual or long-range, user ratings are of uncertain reliability.

Rating Through Quizzes

Finally, I arrive at the system I'd like to use to prove to project leaders and users that professional help can make better documentation.

The goal is to associate each document with a short quiz--say, two or three questions--that determines whether the reader has learned what he needs from the document.

Choosing questions is a delicate task. When testing whether someone can set up a network, for instance, it's not particularly helpful to ask, "What is a Class C network mask?" It would be more pertinent to real-life concerns to ask, "What network mask would someone setting up a home or small office network use?" These questions should be fairly difficult, to minimize the chances that readers already knew the answers before they read the document.

Probably, the choice of questions should not be up to the author of each document. Ultimately, the choice is in the hands of the people who put up the document or fund the system. They may choose to turn it over to subject-matter experts.

From the reader's point of view, the system is low on overhead and investment of time. It looks like:

  1. The reader clicks on a link to access the first page of the document.
  2. The reader can then access pages in any order, and reread pages. The only problem the reader encounters is if he takes more than a few days to read the document. Because URLs are dynamically generated, the document may disappear from the website if the reader tries to return to a page after a time lapse.
  3. On the last page, the reader is asked to test her knowledge. She clicks through two or three pages, answering one question on each. She can return to the document while taking the test. (I believe quizzes should be "open book," to use a metaphor from the classroom. Quiz results should depend on deep comprehension of concepts, not mere memorization.)
  4. The reader is told at the end how well she fared.

On the server side, the system works like:

  1. Each page of the document, and each page of the quiz, is dynamically generated and associated with a random character string on the server. Doing this puts some barriers in the way of somebody who wants to undermine the system by submitting hundreds of quiz results. The server maintains information on the reader's session for a few days, giving the reader plenty of time to access all pages and the quiz.
  2. The reader's answers are stored permanently together with the random character string, which functions as a primary key with which to retrieve results. No information on the user is kept, however.
  3. Results can be retrieved and manipulated for any research purpose: to determine, for instance, how well the document fulfilled its purpose, or whether recent changes to document improved it.

While standard practice might be to use a CAPTCHA ("Completely Automated Public Turing test to tell Computers and Humans Apart") system to stop automated programs that corrupt the rating system by submitting multiple quiz results, I think such a system would impose too much overhead for readers. It's hard enough to get feedback from users without putting them through the hassle of filling in a field with gibberish.

A more fruitful way to suppress ballot-stuffing might be to limit the number of submissions from a single IP address, the number of submissions during a certain time interval, or some combination of the two. Rate-limiting (no pun intended) might place a limit on the number of legitimate quizzed submitted by users behind a firewall, but that could be a reasonable price to pay to fight fraud. If you know you're soliciting a huge number of responses from one site--for instance, from students at a college--just put your site on their side of the firewall.

Search Facilities--a Fix to the Organizational Mess

I'll finish with a few ideas about helping people find their way through the wealth of documentation. On some projects it is like a tropical rainforest, overflowing with abundance but easy to get lost or drowned in.

Every author has to consider what he expects his readers to know before starting his document. Well-organized authors list the topics they expect the readers to know, and a few point to related documents. A list of such prerequisites, as well as suggestions for further reading, would enhance most documents.

In a rich environment such as the Web, an author should not have the burden of guiding readers through the forest. Other sites can sprout up freely to offer the personal choices of different readers. People can offer portals or catalogs that guide readers with particular needs to particular sequences of documents. This development will be particularly important for the spread of documentation in less commonly spoken languages. The challenge is to match each new reader with the selections most relevant and congenial to him.

Google's famous link-count algorithm has shown that the number of citations is a useful guide to the value of a site. Yet just because a document was right for many other people doesn't mean it's right for your particular needs. Maybe lots of other people use Windows and you're on a Mac or a Linux box. Maybe they code in PHP but you use Python.

The previous section introduced another rating system that can help raise the profiles of the best sites; each project can publish links to the documents with the most successful quiz outcomes.

Selecting the most useful document for each reader may be a fuzzy activity for a long time. Some combination of personal recommendations, search engine results, and rating systems will help readers find the information they need.

Finally, a complex and potentially powerful relationship exists between the two main types of documentation I've discussed in this article: spontaneous interaction on mailing lists and more formal documentation. It would be worth exploring how to make this link more useful.

For instance, imagine that advisers on mailing lists would not only point novices to useful documents, but offer themselves as guides. Not "read the eff-ing manual," but "I know a document that explains what you want, and I will interact with you while you read it." That could open a whole new era in literacy--not just what teachers used to call computer literacy, but the ability to glean valuable information from a fixed text. That, I believe, is a perennial goal of education.

Andy Oram is an editor for O'Reilly Media, specializing in Linux and free software books, and a member of Computer Professionals for Social Responsibility. His web site is

Return to

Sponsored by: