This week's column describes two venerable UNIX tools for checking your writing that have been rewritten for Linux, style and diction.
Old-timers probably remember these names -- the originals had came with AT&T UNIX as part of the much-loved ``Writer's Workbench'' (WWB) suite of tools back in the late 1970s and early 1980s. (There had also been a group who planned a ``Reader's Workbench''; we can only guess at what that might have been, but today we do have Project Gutenbook, a new etext reader.)
AT&T unbundled the Writer's Workbench from their UNIX System 7, and as the many flavors of UNIX blossomed over the years, these tools were lost by the wayside -- eventually becoming the stuff of UNIX lore.
In 1997, Michael Haardt wrote new Linux versions of these tools from scratch. They support both the English and German languages, and they're now part of the GNU Project; if you don't already have them installed on your system, you can get them from gnu.org here.
Let's take a look at some of the things that these tools can do.
Use the diction tool to check for wordy, trite, clichéd or misused phrases in a text. It checks for the kind of expressions William Strunk has warned us about in his Elements of Style.
According to Andrew Walker's excellent book The UNIX Environment, the diction tool that came with the old Writer's Workbench just found the phrases, and a separate command called suggest would output suggestions. In the GNU version that works for Linux, both functions have been combined in the single diction command.
In GNU diction, the words or phrases are enclosed in brackets [like this]. If diction has any suggested replacements, it gives them preceded by a right arrow, -> like this.
When checking more than just a screenful of text, you'll want to pipe the output to a tool such as less, so that you can peruse it on the screen. For example, to check a file called banquet-speech.txt for clichés or other misused phrases, you'd type:
$ diction banquet-speech.txt | less RET
You could also redirect the output to a file if you wanted to look at it later:
$ diction banquet-speech.txt > banquet-speech.diction RET
Here, the output is written to a text file called banquet-speech.diction.
If you don't specify a filename, diction reads text from the standard input until you type Control-D on a line by itself -- this is especially useful for when you want to check the diction of a sentence:
$ diction RET So finally, tonight, let us ask the question we wish to state. RET (stdin):1: [So -> (do not use as intensifier)] finally, tonight, let us [ask the question -> ask] [we wish to state -> (cliche, avoid)]. ^D $
To check the text of a Web page, use the text-only Web browser lynx
-nolist options to output the plain text of a given
URL, and pipe it to diction. (If you expect there to be a lot of
output, add another pipe at the end to the less tool so you can peruse
For example, to check the text on the Web page http://example.org/page.html for wordy and misused phrases, you'd type:
$ lynx -dump -nolist http://example.org/page.html | diction | less RET
One of the things that diction looks for are doubled words -- words repeated twice in a row. It encloses the second member of the doubled pair in brackets followed by a right arrow and the text "Double word", like this [this -> Double word.].
If you only want to check a text file for doubled words, and not any of the other things diction checks for, use grep to find only those lines in diction's output that contains the text "Double word", if any. For example, to output all lines containing double words in the file banquet-speech.txt, you'd type:
$ diction banquet-speech.txt | grep 'Double word' RET
The style command analyzes the writing style of a given text. It performs a number of readability tests on the text and outputs their results, and it gives some statistical information about the sentences of the text.
Give as an argument the name of the text file to check. For example, to check the readability of the file banquet-speech.txt, you'd type:
$ style banquet-speech.txt RET
Like diction, style reads text from the standard input if no text is given.
The various readability formulas that style uses and outputs are as follows:
The sentence characteristics of the text which style outputs are as follows:
To output just "difficult" sentences of a text, use the -r option followed by a number; style will output only those sentences whose ARI readability index is greater than the number you give.
For example, to output all sentences in the file banquet-speech.txt whose readability is greater than a value of 20, type:
$ style -r 20 banquet-speech.txt RET
You can use style to output sentences longer than a certain length
by giving the minimum number of words as an argument to the
For example, to output all sentences longer than 14 words in the file banquet-speech.txt, type:
$ style -l 14 banquet-speech.txt RET
Two additional commands that Walker says were part of the Writer's Workbench have long been standard on Linux: look and spell. Both tools work on the system dictionary file, /usr/dict/words. This file is nothing more than a word list (albeit a very large one), sorted in alphabetical order and containing one word per line. Words that are correct regardless of case are listed in lower-case letters, and words which rely on some form of capitalization in order to be correct (such as proper nouns) appear in that form.
The look tool outputs words in the system dictionary that begin with the text you give as an argument. It's useful for checking to see which words begin with a particular phrase or prefix.
For example, to list all the words in the dictionary that begin with the text "homew", you'd type:
$ look homew RET
This command will output words such as "homeward" and "homework."
When you're unsure whether or not a particular word is spelled correctly, use spell to find out. It reads from the standard input and outputs any words that don't appear in the system dictionary file -- so if a word is potentially misspelled, it will be echoed back on the screen after you type it.
For example, to check if the word "occurance" is spelled correctly, you'd type:
$ spell RET occurance RET occurance ^D $
In this example, spell echoed the word "occurance" after it was typed, meaning that this word was not in the system dictionary and therefore was likely a misspelling. A Control-D was typed to exit spell and return to the shell prompt.
Next week: How to make and manage documents with SGML-tools.
Michael Stutz was one of the first reporters to cover Linux and the free software movement in the mainstream press.
More Living Linux articles.
Copyright © 2009 O'Reilly Media, Inc.