oreilly.comSafari Books Online.Conferences.


Why Do People Write Free Documentation? Results of a Survey
Pages: 1, 2, 3, 4, 5

Survey Results and Interpretation

With the questions from the previous section churning in my head, I wrote a survey in late 2006 to ask why people contributed their time to write documentation. Although a lot of online documentation is on corporate or advertising-supported web sites (including those put up by O'Reilly Media), I explicitly excluded such sites, asking for responses only "from people who do this for non-monetary reasons." [Full text of survey]

Data Collected

The survey collected three types of information:

What kinds of documentation people contribute

My goal was to cast the net as wide as possible. A visitor clicking through to the survey was greeted with the question, "Do you answer questions on mailing lists about how to use a software tool or language?" By starting with the most casual, intermittent contributions to online documentation, I signaled that visitors should be very inclusive in thinking about their contributions. The survey continues to talk not only about writing but about translation, editing, technical review, and administration.

The projects to which the respondents contribute

I was hoping to learn whether different types of projects garner different types of help--in particular, to test the thesis mentioned in the previous section that free software is unique in its patterns of participation.

The reasons for contributing

This section was the crux of the survey. It listed eight reasons that I thought were likely candidates for people writing online documentation, along with a text box where respondents could list "other factors."

Two of my reasons tested the mutual back-scratching mentioned in the previous section: mutual aid and community building. I also put in gratitude, which I considered another angle from which to view mutual aid, but which I thought many respondents would see as a separate motivation because of its emotional connotations.

Community building particularly interested me because I had mentioned in an earlier article on mailing lists that the key goal of a technical mailing list was to meet user's technical needs. A researcher in the field of democracy and policy bluntly informed me that I was wrong, and the main goal of the list was to build community. This challenge to my purely instrumental approach intrigued me, but I wanted to test whether contributors had more directly self-rewarding goals as well. I included two reasons that I thought would elicit self-seeking motives. The first was reputation building. I removed any ambiguity about its self-seeking basis, as I described in the previous section, by defining reputation building as follows:

Consultants, trainers, job seekers, authors, and others who hope to build a career go online to build up recognition and respect.

The other selfish reason I offered was personal growth. The inclusion of this reason drew on teachers' well-known observation that they learn as much by teaching as their students do.

Situated halfway between generalized, half-altruistic community building and directly self-seeking behavior was another reason: informal support. I could have tried to tie this reason directly to personal rewards by narrowing the definition to support given by product developers, as I defined in the previous section. But this would have been too hard to define, because as I pointed out there, it's hard to tell why someone is associated with a software project. I can't ferret out which people feel that they get a personal payback from offering project support and those who do it altruistically. So this reason remains ambiguous as a motive for writing documentation.

Finally, I felt I had to include two reasons that didn't fit in with any particular research agenda, in order to capture all the motivations I think of. The first was enjoyment of writing. I know from my own experience, and that of my authors, that this must usually be present for successful writing, although in the field of technical documentation it would hardly be the primary motivation. The last reason I offered was thrills, which I defined as follows:

There's pleasure in seeing your insights turn up almost instantly on a forum with worldwide scope, as well as watching others succeed with your help and praise you for it.

This reason was prompted by an idea I drew from Joseph Weizenbaum's classic Computer Power and Human Reason and wrote up in another article, suggesting that a quest for power drives contributions to free software.


My documentation survey, after some editing by O'Reilly staff familiar with surveys, went up in January 2007. I contacted many leaders throughout the software field, asking them to promote participation among people with whom they had influence, and called on my fellow editors at O'Reilly to make similar appeals to leaders in their technical areas. Tim O'Reilly featured the survey on his popular O'Reilly Radar blog, and an announcement appeared on various O'Reilly Network sites.

I allowed the survey to run for three months, as long as new submissions were being added. Finally, noticing that additions had essentially come to a halt in April 2007, I shut down the survey.

I should make it clear that the sample was self-selected, so we have to be careful when drawing conclusions from the collection of responses. To keep the survey short and make it easy to respond, I decided not to collect demographic information that could have been interesting, such as age, gender, or national origin. Any survey that tried to remedy these deficiencies would have to be backed up by a much more costly and professional strategy for recruiting respondents.

[Complete results as a CSV file] My only changes to the responses were to remove a few phrases that identified individual respondents, in order to adhere to the survey's promise: "Individual responses are strictly confidential and names of participants in this survey will remain private."


Personal connections and coincidences determined where the 354 responses came from. Communities where I am well-known (such as Perl and GNU/Linux) or where leaders encouraged participation (GNU/Linux and GNOME), contributed the bulk of the responses. When leaders failed to act on our requests for publicity and communities didn't take much note of our postings, few responses emerged.

Therefore, one can't draw any conclusions from the pattern of responses from different projects. The most serious consequence of the skewed responses is that only a dozen respondents claimed to contribute documentation on proprietary software. This crippled my attempts to test for differences between free and proprietary software communities, as described in the previous section. [Breakdown of submissions by major projects]

Most Popular Reasons for Contributing

The reasons for contributing documentation turned out to be the data that offered the most insights.

Providing a familiar four-point scale made it easy for people to fill out the survey, but it created a dilemma during interpretation. Suppose Respondent A uses a lot of 3s and 4s, whereas Respondent B has mostly 2s and 3s. Does Respondent A really care more about writing documentation than Respondent B? Should zealots have a greater weight in the results?

My reference to "zealots," of course, is a joke. The problem of weighting responses is endemic to any research based on language, because phrases such as "extremely important" mean different things to different people.

To resolve the dilemma, I use two measures for every calculation. The first measure is just the raw data chosen by each user: a 2 is always a 2 and a 4 is always a 4. If someone doesn't choose a category, the submission is rated a 0. The people who use consistently higher categories have a greater weight on results. I call this raw results.

The second measure adjusts the eight ratings made by each respondent so that every respondent has equal weight in the results. Calculating this measure is trivial: just add up all eight ratings by each respondent and divide each rating by the total. Any reason that a respondent leaves unrated, or rates at the lowest level ("Not important at all"), takes on a value of 0. These adjusted results for each respondent add up to 1.

Raw results are more fair, but they might not be more accurate. It all depends on how consistently respondents interpreted terms such as "extremely." I believe such phrases lead to a wide range of interpretations, so I trust the adjusted results more.

Raw and adjusted results proved to be almost the same for the most basic measurement (Figures 1 and 2): what were the most popular reasons for contributing?

Figure 1
Figure 1. Total raw results

Figure 2
Figure 2. Total adjusted results

Pages: 1, 2, 3, 4, 5

Next Pagearrow

Sponsored by: