oreilly.comSafari Books Online.Conferences.


Unit Testing Your Documentation
Pages: 1, 2, 3, 4


That script works well enough to test most of the code in the Ruby Cookbook, but there are a few twists that make the script I used more complex. The biggest complications are examples that indicate a thrown exception:

10 / 0
# ZeroDivisionError: divided by 0

When Irb encounters an exception, it prints an error message to standard output and keeps on calling gets. Given the example above, it will try to divide 10 by 0, and instead of calling output_value on the "result," it will print the exception to standard output and call gets again. Because there is no more code in that assertion, this second gets call crashes test_recipe.rb. My original script detects this condition and compares the standard output (containing exception information) to the expected value, just like output_value does.

Another complication is code that generates a lot of output. We didn't put all that output in the book, but we wanted to show at least the first part:

('a'..'z').each { |x| puts x }
# a
# b
# c
# ...

In the book, we use ellipses to cut short unimportant parts of the sample output. When comparing expected output to actual output, my code considers an ellipsis as a wildcard that matches any subsequent output.

I also made several minor enhancements to the script to handle tests that initially failed even though the code was correct. I seeded the random number generator before starting the test, so that tests depending on randomness would give the same results every time. I ran each test in a temporary working directory prepopulated with sample files needed by certain recipes. This also kept the main test directory clean, because some tests create new files and directories.

Success Rate

We had a tight deadline, so I focused on the 80 percent solution. Out of 364 recipes and chapter intros, my automated tester was able to parse 273 of them (80 percent). The tester detected 2,279 tests, and it could verify that 1,830 (again, 80 percent) gave the expected results.

Some recipes we simply had to test by hand. The automated tester can't test standalone Ruby programs, CGI scripts, or Rakefiles. It can't test recipes that use libraries incompatible with an irb session, such as Curses or Rails. As expected, those recipes took longer to edit, and we were less confident in the results.

Even the testable recipes contained false failures, which meant manual verification. For instance, many of the tests in Chapter 3 ("Date and Time") depend on the current time, which will forever be different from what's printed in the book. Some of the failed "tests" weren't tests at all: they were explanatory comments that the test framework misinterpreted as expected sample output. These shortcomings were annoying, but only problematic to the extent that they masked real failures.

It didn't do all our testing for us, but the test framework saved enough time to justify its development cost. It pointed out lots of errors we never would have found manually and, once we fixed them, we enjoyed the advantages of unit tests: if the code changed, we could rerun the tests to see whether or not we'd broken something.


Of course, we also had the problems of unit tests. If you don't have good code coverage, the unit tests you have can lure you into a false sense of security. Our tests were pretty sparse, because our main goal was to give the reader an idea how to use a piece of code, not to test every feature and edge case.

This worked well when we were demonstrating third-party libraries and features of the standard library, which have tests independent of our little book. It worked less well on code introduced within the book. Several times reviewers found bugs in our code: edge cases we didn't test in the text of the book.

Buggy unit tests don't prove anything. I admit that I sometimes wrote the examples by running code through irb, instead of figuring out the right answers on my own. In other words, I used the (possibly buggy) code to write the tests. Occasionally the code was buggy but the answers it gave weren't obviously wrong. The tests passed, of course, and (once again) only outside technical review spotted the buggy code.

Finally, unit tests are useless if you don't look at the results. In the very final stages of editing, while preparing a demo, I noticed two failed tests that looked like real problems. They were. The code was buggy. The unit tests had flagged the problem months earlier but I'd simply overlooked it.


It's often possible to automatically test the code examples in a book or other piece of documentation. Doing this lets you make sure the results you wrote down are correct. Automatically testing your examples also gives you some unit test-like coverage of the code used in the examples.

Ruby code is fairly easy to test with an instrumented irb session, as long as the examples can be run in an irb session. Automatic example testing improves reliability and makes it easier to proofread your writing, but it can't replace another pair of eyes on your code. Like other unit tests, these can only do so much to protect you from yourself.

Leonard Richardson has been programming since he was eight. Recently the quality of his code has improved somewhat. He is responsible for libraries in many languages, including Rubyful Soup. A California native, he now works in New York. He maintains a website at

Return to

Sponsored by: