BSD DevCenter
oreilly.comSafari Books Online.Conferences.

advertisement


BSD Hacks

The Making of BSD Hacks

by chromatic, editor of BSD Hacks
06/03/2004

How We Wrote BSD Hacks

Most technical book authors seem to use Microsoft Word to write their books. Yes, that includes Linux and other open source books. Another option is XML, especially the DocBook XML format. (There are likely other popular formats in certain areas; TeX and LaTeX come to mind for scientific publishing.)

My colleagues tell me that Word is passable, if you have a good set of macros and religiously tag data semantically, without worrying about formatting at that point. Since the data has to undergo a conversion during production anyway, any display tweaking you do within the document will slow things down.

Of course, Word doesn't run natively on Linux or BSD. I don't even own a machine that will run it natively. AbiWord and OpenOffice.org open Word documents fairly well, letting me read and convert documents into a more appropriate format, if necessary.

They don't handle the macros quite so well, so there goes one big advantage of writing in an application that can save in the Microsoft Word format.

Then there's XML, which marks up the semantic structure of a document rather nicely and modularly, though it's hard to write. Unfortunately, tools to write XML aren't so popular either. I hear there's a nice Emacs mode, which would be handy if I used Emacs. (Having written two books in DocBook XML, I do have some decent vim bindings, but it's still painful.)

Also, transforming XML into something easier for readers (and, admittedly, authors) to proofread is a little trickier. We have an XSLT stylesheet that sorta works, mostly, to produce fairly decent XHTML.

Related Reading

BSD Hacks
100 Industrial Tip & Tools
By Dru Lavigne

Of course, XML does have a huge advantage in that you can check the book's well-formedness and its validity with standard tools. If you start from a good template and validate the book with every new section you write, you'll spend a lot less time tracking down missing tags later.

XML is also much closer to a plain-text format, which means that if you have a text-munging bent, you can run regular expressions, XML processing, or XSLT programs over your text to make structural changes. Try that in Word! Then again, Word has a change-tracking feature that some of my colleagues swear by.

Neither option really seemed right. For BSD Hacks, we wanted to use open source tools and to write where we were comfortable (that was "in vim," for both of us). I also wanted to make things easier for the production process, as a book can either slide through smoothly or take an amazing amount of time there.

Fortunately, we had another option.

PseudoPOD

Practical Perl programmers already know about POD, the Plain Old Documentation format used for modules, tutorials, and the voluminous documentation that ships with Perl. It's a plain text, mostly human-readable format designed to make writing documentation very simple.

The most common markup elements necessary for a book -- headings, paragraphs, code sections, references to other parts of the book or to web sites, and font styles -- are easy in POD. That covered most of what we needed.

There are a few other niceties, though. For example, there are no good ways to produce a table in POD, nor to include screenshots. There are also some subtleties about character entities in preformatted code snippets. Finally, in any book in progress, editors, authors, and production editors need some way to write notes to each other that won't appear in the final output.

Fortunately, there's PseudoPOD, which apparently came about to address some of POD's shortcomings during the writing of Programming Perl. That's what we chose.

Of course, Perl's POD tools don't know about these extensions, so we would have lost the benefit of tools to verify POD syntax and to transform it to nice XHTML output if it weren't for Allison Randal's Pod::PseudoPod modules. These saved us a tremendous amount of time. Of course, since she's also working on a book in PsuedoPOD, it wasn't an entirely selfless act, but I did report bugs and request new features to make the module more useful for everyone.

Building valid POD didn't mean that we had produced output that the production editors could understand, so it took a few weeks to find out the right semantic markup to use in certain situations. I submitted some writing samples several weeks in advance, though, so they'd fixed most of the problems on their end and I'd fixed most of the problems on our end before the book absolutely had to go to production.

PseudoPOD gave us a nice, simple, text-based, and open format with which we could write in our preferred editor. It's also really easy to add PseudoPOD formatting to a plain-text dump from any other format, if contributors didn't want to learn yet another markup language. "Just send text; we'll convert it!"

We didn't have change control yet, though.

Subversion

I'd used a local Subversion repository to store the data from my previous book. This bought me peace of mind, knowing that I didn't have weird revisions laying around, taking up space. I could also back up the repository regularly and restore it.

Editing a book is a little different than writing one. As an editor, I want to see regular progress. I want to provide regular feedback to the author, and I want her to see the exact changes I've suggested.

I created a networked Subversion repository for the book and set up a simple directory structure. Each chapter had its own subdirectory. The tools/ subdirectory held tools for working with the book and the build/ and build/html directories held generated files.

Adding a new file is easy; just put it in the proper directory, tell Subversion to add it, and commit the change. The next time I update my copy, I'll see it's right there. The next time the author updated her copy, she'd see my changes. That's a lot easier than mailing Word documents or XML files back and forth.

Towards the end of the project, we shuffled around a lot of hacks. This was also very easy in Subversion, which allows moving files around without losing their histories. Instead of keeping a list of which file went where, our repository stored this information.

There was one drawback, however. During a period of particularly high Microsoft Windows virus activity, the Subversion server suffered a major lack of bandwidth. Another time, a power outage revealed that that server wasn't actually on a UPS. Neither problem proved fatal; they merely delayed work by several hours.

Tools

Armed with Pod::PseudoPOD, it is very easy to convert a chapter into valid XHTML. Of course, with a hacks book, there were no chapters yet, just individual hacks in chapter directories.

Because we needed a place to write the chapter introductions, as well as to host the chapter title and main chapter link, I put a chapter_nn.pod file in every chapter subdirectory. Aside from the chapter-specific boilerplate, this had links to all of the hacks within the chapter.

Every time I finished editing a hack, I'd add it to the appropriate chapter file.

My tools/build_chapters.pl program loops through all of the chapter subdirectories, grabs the chapter file, replaces each hack's link with the contents of the hack, and writes the results to a build/chapter_nn.pod file.

The tools/pod2html program was a thin wrapper around Pod::PseudoPOD::HTML. It converts the built chapters into HTML files in build/html. I could view these files in a web browser or bundle them up and send to reviewers or other editors.

This program also reports validity errors in POD files. It does report them with regard to the built files, not the individual hack files, but working in small pieces and making sure to validate as often as possible minimizes the work involved in tracking down errors.

I did shamelessly borrow the stylesheet from the DocBook converter I used for the previous book, but I added a few commands to highlight author and editor comments to stand out nicely.

One of the comments from the production tools check revealed that we hadn't indented code snippets appropriately. This would make all computer input and output display in a proportional font -- not a fixed-width font, as people would expect. Fortunately, with POD parsing tools, it took only a few minutes to write a program to scan for unindented sections and to correct them.

My reward was two-fold. First, production commented that the preproduction check, while taking up a lot of time, allowed them to smooth the process. Second, they told the series editor that the book came in as clean as anything they'd ever seen. At least, that's what Rael told me.

Other Ideas

Because of the quick schedule of a Hacks book and my other responsibilities, there are still a few rough edges to smooth out. Also, I have further ideas to explore.

Lightweight Technical Review

Allison Randal, working on Perl 6 and Parrot Essentials, is also using PseudoPOD and Subversion. One of the nicest features of Subversion is that it optionally serves its files through the Apache web server -- so you can use the standard Apache logging and authentication features.

Better yet, you can always grab the latest version of the files in the repository through a standard web browser.

This makes it very easy to give reviewers the URL of the repository, usernames, and passwords, and allow them to see draft copies -- even if you've made changes since sending out the message. Since Pod::PseudoPOD is on the CPAN, they can convert the files to HTML on their own, if they like. Of course, for a Perl book, almost everyone qualified to review the book can also read and write POD, so it's not as necessary as it would be for a book on a different subject.

Allison has set up a simple, temporary mailing list using my Mail::SimpleList mailing list, where interested reviewers can make comments. By all accounts, it's working pretty well.

Automated Events

Subversion provides hooks to perform specified behavior whenever certain events occur. For example, it can send an email whenever someone checks in a changeset. I toyed with the idea of having the repository email a simplelist containing myself and the author, but rejected the idea as generating too much mail. It might be nice on different projects.

Another, better, idea might be to have the repository check the validity of a document before finishing a check-in. This could be annoying at first, but it would prevent any invalid documents from entering the repository.

Finally, because I've spent so long as a programmer, I love changelogs. They're very handy for keeping track of the current state and, occasionally, seeing just where things went wrong.

I kept a detailed ChangeLog file for the book. Expecting an author to maintain this is probably a silly idea. I should have added a hook to write to this file with every commit message. That would have made both of our lives a little easier.

My first goal was to produce an excellent book -- and I'm satisfied with what we've done. My second goal was to do this as easily as possible. There are still some rough edges, but I think we're really on to something good here.

chromatic manages Onyx Neon Press, an independent publisher.


In May 2004, O'Reilly Media, Inc., released BSD Hacks.


Return to the BSD DevCenter.



Sponsored by: