Published on (
 See this if you're having trouble printing code examples

Unit Testing Your Documentation

by Leonard Richardson

When O'Reilly editor Mike Loukides contacted me about co-writing the Ruby Cookbook, I was apprehensive. I wasn't worried about the size of the project; I was concerned about quality. How could I work to the level of quality I expected from O'Reilly books--especially the Python and Perl Cookbooks, against which I knew people would measure this book? I'd heard horror stories of books that didn't meet O'Reilly's usual standard, books rushed through second editions because the code was full of bugs. I didn't want that to happen to my book. (I still hope it doesn't!)

At first, I thought it would be especially difficult to ensure the quality of a cookbook. Instead of a few application-sized examples illustrating a few coherent topics (say, database access in Java), we had to test 350 separate pieces of code on 350 wide-ranging topics. As with a software project, we had deadlines to meet.

Worse, due to the structure of our contract and the scarcity of proofreader time, this book was essentially a Waterfall project. Up to the very end, our focus was on getting everything written, with some time allocated afterward to edit the text and code. This isn't quite as crazy as it sounds, because bad text is easier to whip into shape than bad code, but it meant I didn't have a lot of time to spend on building test infrastructure.

Fortunately, despite my early misgivings, the Cookbook format actually made the individual recipes easy to test. I was able to turn pre-existing features of the recipes (the worked examples) into unit tests. I ran the tests in an instrumented irb session and generated a report that flagged any failures.

Thanks to the test framework, on a good day I could proofread, debug, and verify the correctness of 30 recipes. I worked faster and with greater confidence than I could doing everything by hand. I was also able to incorporate the test results into the general "confidence score" calculated for each recipe on my unofficial Ruby Cookbook homepage: a visible, though somewhat vague, metric of quality.

In this article, I present a simplified, cleaned-up version of my testing script. It parses recipe text into a set of code chunks and assertions. It then runs the code chunks in an instrumented irb session, and compares the assertions to reality. It works in a way similar to Python's doctest library.

Defining the Problem

Most of the recipes in the Ruby Cookbook contain a series of code samples depicting a single irb session. I've annotated important Ruby expressions in these samples with comments depicting their value or output. Here's a code sample from Recipe 1.15, "Word-Wrapping Lines of Text":

def wrap(s, width=78)
  s.gsub(/(.{1,#{width}})(\s+|\Z)/, "\\1\n")

wrap("This text is too short to be wrapped.")
# => "This text is too short to be wrapped.\n"

puts wrap("This text is not too short to be wrapped.", 20)
# This text is not too
# short to be wrapped.

Here, an ASCII arrow indicates the value of the first call to wrap, just like in irb. The second call is part of a puts statement, so instead of the value of that statement (a boring nil), the example shows the string printed to standard output.

Both the value and the output are hidden in comments, so that the reader can copy and paste the sample code directly into irb. By following along with the recipe, the reader can try out techniques used in the recipe solutions. At every important step, the reader can compare his results against what it says in the book to see if he understand the code correctly. After reaching the end of a recipe, the reader has libraries loaded and objects set up for further experimentation.

The flip side--what the book says had better be right. Running all that code and cross-checking the results against the comments would take a long time. However, it wouldn't require a lot of brainpower, so why not do it automatically?

We couldn't stick Test::Unit calls in the sample code: it would distract from the main point of the recipes. Yet those annotated lines of code are, effectively, unit tests: assertions about what happens when you use the previously defined code. They serve a pedagogical purpose, but they can also help verify quality.

The Recipe Format

The first step is to parse out the code from the English text of the recipe. Fortunately, we wrote the Ruby Cookbook in a wiki markup format similar to RedCloth. Lines containing three backticks delineate chunks of code:

 This is the English text of the book

 puts "This is Ruby code."
 # "This is Ruby code."

 This is more English text.

What about the format of the Ruby code? If a line ends with a comment containing an arrow, that's an assertion about the value of the expression on that line.

'pen' + 'icillin'                # => "penicillin"
['pen', 'icill'] << 'in'         # => ["pen", "icill", "in"]

If a line begins with a comment containing an arrow, that's an assertion about the value of the expression on the previous line.

'banana' * 10
# => "bananabananabananabananabananabananabananabananabananabanana"

If a line begins with a comment with no arrow, that's an assertion about the previous expression's output. An expression can yield multiple lines of output.

puts 'My life is a lie.'
# My life is a lie.

puts ['When', 'in', 'Rome'].join("\n")
# When
# in
# Rome

Any other line in a code chunk is a normal line of Ruby code with no associated assertion.

Parsing a Recipe into Assertions

The rest of this article presents test_recipe.rb, a modified version of the Ruby script I used to parse and test our recipes' assertions. It starts with a simple struct class to hold chunks of code and the associated assertions:

# test_recipe.rb

Assertion =, :should_give, :how)
class Assertion
  SEPARATOR = "\n#{'-' * 50}\n"

  def inspect(against_actual_value=nil)
    s = "*** When I run this code: ***" +
      SEPARATOR + code.join("\n") + SEPARATOR +

      "*** I expect this #{how}: ***" +
      SEPARATOR + should_give.join("\n") + SEPARATOR

    if against_actual_value
      s += "*** What I get is this: ***" +
        SEPARATOR + against_actual_value + SEPARATOR + "\n"

Every recipe is treatable as a self-contained file. The AssertionParser class transforms one such file into an array of Assertion objects representing data to be fed into an irb session.

It starts by splitting a recipe on the triple backtick and examining each snippet of code. In the book, most of these snippets are part of the recipe's irb session, but some are sample shell sessions, standalone Ruby files, or code in a language other than Ruby. The program needs to filter out those snippets. For simplicity's sake I omitted that code, which is just lots of checks against the first few bytes of a snippet.

# Parses a Ruby Cookbook-formatted recipe into a set of assertions
# about chunks of code.
class AssertionParser
  attr_reader :assertions


  def initialize(code)
    @assertions = []

    # Strip out the code snippets from the English text.
    snippets = []
    code.split(/```\s*\n/).each_with_index do |x, i|
      # Not shown: filter snippets that aren't part of the irb session.
      snippets << x if (i % 2 == 1)

The second step is to separate the Ruby code into chunks, each of which terminates in an assertion to check. AssertionParser scans the Ruby code line by line, gathering chunks of code, finding each assertion and associating it with the foregoing chunk.

This section handles a line containing an assertion about an expression's expected standard output:

    # Divide the code into assertions.
    snippets.join("\n").each do |loc|
      if loc.size > 0
        if EXPRESSION_OUTPUT_COMMENT.match(loc)
          # The code so far is supposed to write to standard output.
          # The expected output begins on this line and may continue
          # in subsequent lines.
 = :stdout if @assertion.should_give.empty?

          # Get rid of the comment symbol, leaving only the expected output.
          loc.sub!(EXPRESSION_OUTPUT_COMMENT, '')
          @assertion.should_give << loc

Another section handles a line containing an assertion about an expression's expected value:

        elsif EXPRESSION_VALUE_COMMENT.match(loc)
          # The Ruby expression on this line is supposed to have a
          # certain value. If there is no expression on this line,
          # then the expression on the previous line is supposed to
          # have this value.

          # The code up to this line may have depicted the standard
          # output of a Ruby statement. If so, that's at an end now.
          create_assertion if == :stdout and @assertion.code

          expression, value = \
            loc.split(EXPRESSION_VALUE_COMMENT, 2).collect { |x| x.strip }
          @assertion.should_give = [value]
          @assertion.code << expression unless expression.empty?

This section handles all other lines of code:

          # This line of code is just another Ruby statement.

          # The code up to this line may have depicted the result or
          # standard output of a Ruby statement. If so, that's now at
          # an end.
          create_assertion unless @assertion.should_give.empty?

          @assertion.code << loc unless loc.empty?
    create_assertion # Finish up the last assertion

  # A convenience method to append the existing assertion (if any) to the
  # list, and create a new one.
  def create_assertion
    if @assertion && !@assertion.code.empty?
      @assertions << @assertion
    @assertion =[], should_give=[], how=:value)

Scripting an irb Session

Now the program has a list of Assertion objects: chunks of code and the expected values when that code is run. We wrote the Ruby Cookbook code to run in an irb session, so that's how I tested it. I decided it would be easier to script an irb session than to figure out how to get the equivalent behavior out of eval.

The hardest part of scripting an irb session is knowing what to override. The class to modify is IRB::Irb. To get an Irb instance to accept an alternate source of input, give it an input class that supports the methods gets and prompt=.

Comparing the actual result of an expression to the expected result is also easy. The value of the most recent expression in an irb session is available through the Irb object's instance variable @context.

Here's an Irb subclass that takes a harness variable as its input source and sets up the interpreter to use it. The irb code passes HarnessedIrb#output_value the value of every expression it runs. To keep this class simple, its output_value implementation simply delegates to the harness class. This class and an appropriate harness are all you need to instrument an irb session and inspect the output of the expressions.

require 'irb'
class HarnessedIrb < IRB::Irb

  def initialize(harness)
    # Prevent Ruby code from being echoed to standard output.
    IRB.conf[:VERBOSE] = false
    @harness = harness
    super(nil, harness)

  def output_value

  def run
    IRB.conf[:MAIN_CONTEXT] = self.context

Here's an AssertionTestingHarness class that takes a list of Assertion objects and feeds the code into irb, one line at a time.

@require 'stringio'

class AssertionTestingHarness
  def initialize(assertions)
    @assertions = assertions
    @assertion_counter, @line_counter = 0
    @keep_feeding = false
    $stdout =

  # Called when irb wants a line of input.
  def gets
    line = nil
    assertion = @assertions[@assertion_counter]
    @line_counter += 1 if @keep_feeding
    line = assertion[:code][@line_counter] + "\n" if assertion
    @keep_feeding = true
    return line

  # Called by irb to display a prompt to the end-user. We have no
  # end-user, and so no prompt. Strangely, irb knows that we have no
  # prompt, but it calls this method anyway.
  def prompt=(x)

The irb interpreter calls output_value every time it evaluates a line of code, but nothing happens except on the final line of a code chunk, when it's time to test the assertion.

  # Compare a value received by irb to the expected value of the
  # current assertion.
  def output_value(value)
      assertion = @assertions[@assertion_counter]
      if @line_counter < assertion[:code].size - 1
        # We have more lines of code to run before we can test an assertion.
        @line_counter += 1

The interpreter passes the result of the Ruby expression as an argument to output_value. If the assertion is a :value-type assertion, the harness simply compares the expected value to that argument.

If the assertion is a :stdout-type, then it is ignorable. Instead, the harness captures the standard output gathered during the code chunk, and compares that to the expected value. This is why the initialize method replaces $stdout with a StringIO object.

        # We're done with this code chunk; it's time to check its assertion.
        value = value.inspect
        if assertion[:how] == :value
          # Compare expected to actual expression value
          actual = value.strip
          # Compare expected to actual standard output.
          actual = $stdout.string.strip
        report_assertion(assertion, actual)
        # Reset standard output and move on to the next code chunk
        @assertion_counter += 1
        @line_counter = 0
        $stdout.string = ""
    rescue Exception => e
      # Restore standard output and re-throw the exception.
      $stdout = STDOUT
      raise e
    @keep_feeding = false

The report_assertion method compares an assertion to reality. When testing the book, my harness printed out an HTML report for each recipe, flagging failed assertions in red (here's the report for "Word-Wrapping Lines of Text"). The implementation presented here is much simpler; it inspects the assertion in light of the code chunk's actual value. A third implementation might make a Test::Unit assertion here.

  # Compare the expected value of an assertion to the actual value.
  def report_assertion(assertion, actual)
    STDOUT.puts assertion.inspect(actual)

Finally, here is code to test standard input when running this code as a script.

if $0 == __FILE__
  assertions =$

Run a recipe into this script to extract, evaluate, and test its code listings. The following is the result of this script run against the sample code from "Word-Wrapping Lines of Text" given above. There are five lines of Ruby code here and two assertions:

*** When I run this code: ***
def wrap(s, width=78)
  s.gsub(/(.{1,#{width}})(\s+|\Z)/, "\\1\n")
wrap("This text is too short to be wrapped.")
*** I expect this value: ***
"This text is too short to be wrapped.\n"
*** What I get is this: ***
"This text is too short to be wrapped.\n"

*** When I run this code: ***
puts wrap("This text is not too short to be wrapped.", 20)
*** I expect this stdout: ***
This text is not too
short to be wrapped.
*** What I get is this: ***
This text is not too
short to be wrapped.


That script works well enough to test most of the code in the Ruby Cookbook, but there are a few twists that make the script I used more complex. The biggest complications are examples that indicate a thrown exception:

10 / 0
# ZeroDivisionError: divided by 0

When Irb encounters an exception, it prints an error message to standard output and keeps on calling gets. Given the example above, it will try to divide 10 by 0, and instead of calling output_value on the "result," it will print the exception to standard output and call gets again. Because there is no more code in that assertion, this second gets call crashes test_recipe.rb. My original script detects this condition and compares the standard output (containing exception information) to the expected value, just like output_value does.

Another complication is code that generates a lot of output. We didn't put all that output in the book, but we wanted to show at least the first part:

('a'..'z').each { |x| puts x }
# a
# b
# c
# ...

In the book, we use ellipses to cut short unimportant parts of the sample output. When comparing expected output to actual output, my code considers an ellipsis as a wildcard that matches any subsequent output.

I also made several minor enhancements to the script to handle tests that initially failed even though the code was correct. I seeded the random number generator before starting the test, so that tests depending on randomness would give the same results every time. I ran each test in a temporary working directory prepopulated with sample files needed by certain recipes. This also kept the main test directory clean, because some tests create new files and directories.

Success Rate

We had a tight deadline, so I focused on the 80 percent solution. Out of 364 recipes and chapter intros, my automated tester was able to parse 273 of them (80 percent). The tester detected 2,279 tests, and it could verify that 1,830 (again, 80 percent) gave the expected results.

Some recipes we simply had to test by hand. The automated tester can't test standalone Ruby programs, CGI scripts, or Rakefiles. It can't test recipes that use libraries incompatible with an irb session, such as Curses or Rails. As expected, those recipes took longer to edit, and we were less confident in the results.

Even the testable recipes contained false failures, which meant manual verification. For instance, many of the tests in Chapter 3 ("Date and Time") depend on the current time, which will forever be different from what's printed in the book. Some of the failed "tests" weren't tests at all: they were explanatory comments that the test framework misinterpreted as expected sample output. These shortcomings were annoying, but only problematic to the extent that they masked real failures.

It didn't do all our testing for us, but the test framework saved enough time to justify its development cost. It pointed out lots of errors we never would have found manually and, once we fixed them, we enjoyed the advantages of unit tests: if the code changed, we could rerun the tests to see whether or not we'd broken something.


Of course, we also had the problems of unit tests. If you don't have good code coverage, the unit tests you have can lure you into a false sense of security. Our tests were pretty sparse, because our main goal was to give the reader an idea how to use a piece of code, not to test every feature and edge case.

This worked well when we were demonstrating third-party libraries and features of the standard library, which have tests independent of our little book. It worked less well on code introduced within the book. Several times reviewers found bugs in our code: edge cases we didn't test in the text of the book.

Buggy unit tests don't prove anything. I admit that I sometimes wrote the examples by running code through irb, instead of figuring out the right answers on my own. In other words, I used the (possibly buggy) code to write the tests. Occasionally the code was buggy but the answers it gave weren't obviously wrong. The tests passed, of course, and (once again) only outside technical review spotted the buggy code.

Finally, unit tests are useless if you don't look at the results. In the very final stages of editing, while preparing a demo, I noticed two failed tests that looked like real problems. They were. The code was buggy. The unit tests had flagged the problem months earlier but I'd simply overlooked it.


It's often possible to automatically test the code examples in a book or other piece of documentation. Doing this lets you make sure the results you wrote down are correct. Automatically testing your examples also gives you some unit test-like coverage of the code used in the examples.

Ruby code is fairly easy to test with an instrumented irb session, as long as the examples can be run in an irb session. Automatic example testing improves reliability and makes it easier to proofread your writing, but it can't replace another pair of eyes on your code. Like other unit tests, these can only do so much to protect you from yourself.

Leonard Richardson has been programming since he was eight. Recently the quality of his code has improved somewhat. He is responsible for libraries in many languages, including Rubyful Soup. A California native, he now works in New York. He maintains a website at

Return to

Copyright © 2009 O'Reilly Media, Inc.