Here's a quick quiz. Which scripting language ...
Through the first three questions you might well have answered "Perl," but question four is a stumper. Powerful and useful as it is, few might claim that Perl is a language that is easy to learn or master.
Rexx is the answer. Rexx was the first widely used scripting language. Though IBM invented it 25 years ago, it may come as a surprise that this language is more popular today than ever. There are now nine free and open source Rexx implementations. These run under virtually any operating system on any platform. All but one meet the Rexx language standard, and each has optimizations or extensions for a specific purpose. For example, two are object-oriented, one integrates with Java, and still others target handhelds, Linux, or Windows.
Rexx's worldwide user community numbers in the hundreds of thousands and is strongest in the new Europe. This community conducts active online forums in many spoken languages and produces tons of free tools and interfaces. That a language of this impact garners almost no recognition in the American Linux open source community is truly amazing.
While the "Rexx story" may be ironic, what really counts is what the language can do for you. That's what this article addresses. Rexx makes a nice complement to languages like Perl, Bash, or Korn because it evolved from an entirely different scripting tradition and features a contrasting personality. By the end of the article, you'll be scripting.
Rexx is as easy as PHP or Basic, but powerful enough to run mainframes.
Ease of use and power normally conflict. How does Rexx pull it off? The key is that power does not require complexity. Rexx is free-format and case-insensitive. Use spacing, blank lines, and capitalization to style your code however you prefer. Rexx is virtually devoid of syntax. It eschews special characters and special and default variables. This contradicts the Unix tradition in which scripting languages like Perl, Bash, Korn, and Awk leverage syntactical complexity as the basis for their power.
Rexx automates the data definition and typing tasks that other languages require of developers. Like Tcl/Tk, Rexx variables are all variable-length character strings. You can use string values that represent numbers in calculations; the others are simply strings (character, bit, or hex) that allow string manipulations but cannot participate in calculations. Rexx transparently converts data between "types" as needed. For consistent computational results across platforms, Rexx employs decimal rather than binary arithmetic.
You don't need to declare or predefine your variables. Rexx automatically allocates them when you first refer to them. These principles also apply to Rexx arrays or tables. Arrays don't have predefined, fixed sizes. They expand to the size of available memory, can be sparse or dense, and hold either homogeneous or heterogeneous elements. Rexx relieves programmers of the details of variable and memory management.
Like C, Rexx has a tiny nucleus of two dozen instructions. It derives power from the set of 70 built-in functions that surround this nucleus. You can program immediately with little knowledge of the language, and then expand into broader use of the function set over time. Add-in functions from external libraries or packages are coded just like those built into the language.
Rexx is a glue language. It leverages existing code--operating system commands, programs, services, interfaces, tools, function libraries, widgets, objects, controls, shared libraries, and dynamic link libraries. Rexx can act either as a macro language or an embedded language. It's good at interacting with operating systems, editors, or other programmable systems.
This all adds up to a unique scripting paradigm. Rexx's power comes from simplicity--not from adding on features or complex syntax. Simplicity results in reliable code. It means that programs are understandable, enhanceable, and maintainable. It renders developers productive because it allows them to concentrate on the logic of solutions rather than programming language linguistics. Occasional users don't have to pick up a reference manual to refresh their memories when they script.
If Rexx is as simple yet as powerful as I claim, you should be able to understand useful scripts almost immediately. Consider the following three examples.
The first is a very easy one. This program accepts a file name as an input
argument from the command line, and then reads through that file. It displays on
the screen any lines from the input file that contain the phrase
PAYMENT_OVERDUE:
/*******************************************************************/
/* Find Payments: */
/* Reads accounts lines one by one, and displays overdue */
/* payments (lines containing the phrase PAYMENT_OVERDUE). */
/*******************************************************************/
1 arg filein /* Read the input file name*/
2 do while lines(filein) > 0 /* Do while a line to read */
3 input_line = linein(filein) /* Read an input line */
4 if pos('PAYMENT_OVERDUE',input_line) > 0 then
5 say 'Found it:' input_line /* Write line if $ overdue */
6 end
The script consists of but six lines of executable code. I've numbered them
on the left-hand side for easy reference. (The numbers are not part of Rexx,
they're just there to facilitate discussion.) Lines enclosed between
/* and */ are comments. These can follow executable
code, occur on lines of their own, or even span lines (the latter can be useful
for quickly decommissioning blocks of code during testing). This script starts
with a comment block that explains its purpose.
The arg instruction reads the input file name supplied as an
argument on the command line from the program's invocation. The second line
uses the lines function to test for the end of file on the input
file. As long as there is a line left to read, the do while loop
executes. Note that functions in Rexx, whether built-in or external, have
immediately following parenthesis enclosing any arguments. If there are no
function arguments, you still code the parentheses, as in
my_function().
Line 3 reads a line from the input file via the linein
function, while line line 4 scans that input line for the search phrase. Its
pos function returns the position of the phrase in the search
string if it is present, or 0 if not. Line 5 continues the if
instruction and writes any line containing the search phrase to the display
screen.
The say instruction takes any number of operands, evaluates
them, concatenates them, and displays them. Here it has two operands, a
literal character string enclosed in quote marks and the
input_line variable. The end keyword in line 6 ends
the do while statement.
What's remarkable about this script is what it is missing--what Rexx
automatically does for the programmer. There is no declaration, opening, or
closing of the input file. Rexx automatically opens a file when it's used and
closes it when the script ends. There are no variable predeclarations or
specifications of their "data types." The program establishes the variable
filein by reading a value into it in line 1, while the assignment
statement of line 3 creates the variable named input_line.
Neither variable has a fixed data type or size.
Look at the power of the lines function in line 2. It tests
the end-of-file condition on the input file such that you do not have to code
two linein functions in the script to create the top-driven read
loop required in structured programming. Just code the one linein
function and the code is still structured. Rexx has all of the tools to support
structured programming and modularity, including a full set of instructions to
implement structured logic, internal and external functions and subroutines,
and variable scoping and protection. Modular, structured code reduces errors
and produces more reliable, maintainable code. (Rexx is a power language,
though, so it includes unstructured instructions and easily allows you to
override its many automatic behaviors whenever you want).
Someone might look at this first example program and say, "Rexx is not that powerful. All you did was write a simple filter and it took you six lines of code." Actually, Rexx replaces the function invocations you encode with their return values. So if your goal is to accomplish the most work in the smallest number of lines, you could nest the functions in the example as deeply as you like to reduce the number of lines in the script. I chose not to. Deeply nested, "fortune-cookie" programming may be powerful, but it is also unreadable, unmaintainable, and hard to debug. Rexx supports function nesting to an arbitrary depth, but encourages simplicity. The most powerful scripting language is not the one that solves a problem in the fewest lines of code; it is the language that solves the problem in deft fashion while producing reliable, readable, maintainable code.
Someone else might look at this first example program and say, "You just
wrote a Linux or Unix fgrep utility in six lines of code." Well,
granted, the script does perform an unnecessary task, but it was the first
example. If you prefer the fgrep approach, that's fine. Here's a
complete Rexx script that implements it:
arg filein
fgrep PAYMENT_OVERDUE filein
The Rexx interpreter parses, analyzes, and executes source lines one by one.
When it encounters something it does not recognize as a valid part of the Rexx
language--but is still syntactically legitimate--it passes it to the
"outside environment" for execution. By default, this outside environment is
the operating system's command line. These two lines are a complete Rexx
program, the second line of which consists of a single Linux or Unix operating
system fgrep command. After evaluating the second statement and
substituting in the proper value for the variable filein, Rexx
does not recognize this string as a part of the Rexx language, so it passes the
command to the operating system and then waits for its completion and continues.
(Of course, like all of Rexx's default behaviors, you can override this default
process.)
In this simple example, Rexx processes the operating system command and ends, because there is no more to the script. This principle leads to a very powerful place. Rexx scripts easily issue commands to the operating system--or any other external environment, interface, or package. Developers can bring the full power of a string-processing language to bear in dynamically preparing and managing those commands. Scripts can inspect return codes from commands, parse and analyze command outputs and error messages, and intelligently interact and respond to them. Now, that's power!
|
You've read one Rexx script, so given that the language is easy to learn, you're ready for something interesting. The next script illustrates a Rexx feature common to some of today's more powerful languages: the ability to index arrays or tables by non-numeric subscripts.
This example program includes an array that defines three Chicago-area telephone area codes. It prompts the user to enter the name of a town in the Chicago metro area. Then it retrieves and displays the telephone area code associated with that town. The interesting feature of this script is that the town name is the index into the array the script uses to retrieve the telephone area code. Here's the script:
/*******************************************************************/
/* Code Lookup: */
/* Looks up the areacode for the town the user enters. */
/*******************************************************************/
1 area. = '' /* Initialize array entries to null */
2 area.CHICAGO = 312 /* Define a table of area codes */
3 area.HOMEWOOD = 708
4 area.EVANSTON = 847
5 do while town <> '' /* Loop until user enters null line */
6 say 'For which town do you want the area code?'
7 pull town
8 if town <> '' then do
9 if area.town = ''
10 then say 'Town' town 'is not in my database'
11 else say 'The area code for' town 'is' area.town
12 end
13 end
The first line in the program defines an array or table named
area. You can tell it's an array because it ends with a period.
You don't have to declare or define an array prior to using it, but I found it
useful because I wanted to initialize all array elements to the null string.
The script does this without specifying how many elements the array might
contain or the "data types" of those elements. (Rexx arrays can typically grow
to the size of available memory.)
Lines 2 through 4 initialize by associating three towns in the array with their respective telephone area code values. Variables that contain internal periods are array variables or compound elements. The interesting thing here is that the array elements have string subscripts (the names of towns) rather than numeric ones. This means that the array is content-addressable. It is a form of associative memory, in that it associates or relates values represented by arbitrary strings.
Line 5 continues the script while the value of variable town is anything
other than the null string. The town variable has had no previous
use or declaration, so its "uninitialized value" is its own name in upper case
(TOWN). After prompting the user in line 6, the script reads a town from the
user in line 7. The pull instruction reads one or more variables
and automatically translates them into upper case.
If the user declines to enter a town by not entering anything and pressing
the Enter key, line 8 identifies the situation, skips the
if statement in lines 9 through 11, and exits the do
while loop and the program. If the user does enter a town, line 9 looks
it up in the area table. Either line 10 displays a message that it is not in
the script's "database" (the area array) or line 11 displays the
proper area code for the town. In line 11, the compound variable reference
area.town is what displays the relevant area code to the user.
There's one other feature to mention in this script. The table lookup works
properly because the pull instruction automatically translates the
user's input to upper case. Because Rexx considers variable names internally
as upper case, the comparison would also have worked if I had coded the array
names as area.Chicago, area.chicago,
AREA.CHICAGO, or ... you get the idea. Rexx routinely provides
this kind of convenient automation, but always offers easy ways to avoid it.
For example, to avoid automatic upper-case translation in strings, quote them.
To avoid it in reading variable values, use the parse pull
instruction instead of pull.
As variable-sized, content-addressable entities, Rexx arrays have a wide range of uses. Here's another example. This script implements the weighted-retrieval algorithm that forms the basis for bibliographic search services in libraries. The idea is that resources like books, magazine articles, and videos have assigned lists of descriptors. The user inputs his own search descriptors along with a weight: the number of his descriptors that must match the resource's descriptors in order for that resource to match his query. The search system retrieves the most relevant resources, as determined by the weight of their matches, and typically displays them in ranked (weighted) order.
For clarity of illustration, I've simplified the algorithm. I hardcoded the search descriptors (or keywords) and only allow the user to input the weight or threshold for retrieval. I also coded the resources (book titles) and their descriptors right into the program, rather than reading them from a database.
Here's the script. It uses two arrays. The first is a list of keywords that describe the retrieval topics. The second is a list of three books, categorized by three descriptors apiece:
/*********************************************************************/
/* Find Books: */
/* This program illustrates how arrays may be of any dimension */
/* in retrieving book titles based on their keyword weightings. */
/*********************************************************************/
1 keyword. = '' /* Initialize both arrays to all null strings */
2 title. = ''
/* The array of keywords to search for among the book descriptors */
3 keyword.1 = 'earth' ; keyword.2 = 'computers'
4 keyword.3 = 'life' ; keyword.4 = 'environment'
/* The array of book titles, each having several descriptors */
5 title.1 = 'Saving Planet Earth'
6 title.1.1 = 'earth'
7 title.1.2 = 'environment'
8 title.1.3 = 'life'
9 title.2 = 'Computer Lifeforms'
10 title.2.1 = 'life'
11 title.2.2 = 'computers'
12 title.2.3 = 'intelligence'
13 title.3 = 'Algorithmic Insanity'
14 title.3.1 = 'computers'
15 title.3.2 = 'algorithms'
16 title.3.3 = 'programming'
17 arg weight /* Get number keyword matches required for retrieval */
18 say 'For weight of' weight 'retrieved titles are:' /* Output header */
19 do j = 1 while title.j <> '' /* Look at each book */
20 count = 0
21 do k = 1 while keyword.k <> '' /* Inspect its keywords */
22 do l = 1 while title.j.l <> '' /* Compute its weight */
23 if keyword.k = title.j.l then count = count + 1
24 end
25 end
26 if count >= weight then /* Display titles matching the criteria */
27 say title.j
28 end
The first two lines of the program initialize all positions in both arrays
to the null string. Lines 3 and 4 define the elements in the
keyword array, while lines 5 through 16 define the three titles
and the list of descriptors for each. I've defined all array elements as
quotation-delimited character strings. This preserves their case sensitivity.
I've placed more than Rexx statement per line in defining the keywords array by
using the semicolon to separate the statements. I created a hierarchical or
tree structure in the title array merely by using multiple
subscripts to describe array elements. Rexx enables subscripting to an
arbitrary depth, with any number of subscripts or dimensions, limited only by
available memory.
Line 17 reads the weight the user enters from command-line argument provided to the script. Line 18 writes a header for the program's output list of retrieved titles.
The do while loop of lines 19 through 28 process a single book
or title. Line 20 initializes the weight or count of descriptor matches for
the book title to 0. The loop of lines 21 through 25 processes all of the
retrieval keywords against a single book title, while the innermost loop of
lines 22 to 24 accumulates the matching weight for an individual book title.
Lines 26 and 27 display the books with a number of matches at least equal to
the hit count or weight dictated by the user.
Unlike most programming languages, Rexx does not have many built-in data structures. This script shows why. Associative arrays are easy to use to implement data structures of arbitrary complexity. The processing logic in lines 19 through 28 would work without alteration even if I changed the number of keywords or book descriptors, assigned different numbers of descriptors per book, or read the contents of either array from the user or from a file or a database.
The second scripting example demonstrated a lookup table, a data structure embodying the key-value pairs popular in Perl and in the open source Berkeley DB database. This example script demonstrates a list in its keywords table and a tree in its array of titles. This tree happens to be balanced tree, but skewed or unbalanced trees are also possible.
Arrays can also group heterogeneous data items to implement the equivalent of C/C++ structures or Pascal or COBOL record definitions. They can even implement structures requiring symbolic pointers, such as linked lists and doubly linked lists. Rexx arrays may be dense or sparse, contain homogenous or heterogeneous elements, and expand and contract as necessary.
Here is Rexx, a language that "lacks data structures"--and yet permits you to create them without any special syntax. Power comes from ease of use, not from adding features or complexity.
This article offers just a taste of what Rexx is about. It excludes object-oriented Rexx, which runs standard procedural Rexx programs without alteration, yet fully supports object-oriented scripting.
Open Object Rexx includes classes, methods, messaging, encapsulation, abstraction, multiple inheritance, polymorphism, and a huge hammer of a class library. It retains Rexx's ease of use while providing the full power of object-oriented programming.
I also omitted NetRexx, a "Rexx-like" language that extends Rexx's ease of use into the Java environment. NetRexx runs on both clients and servers. Use it to develop classes, applets, applications, servlets, and beans. NetRexx functions as an interpreter or compiler, so you can run it with or without a Java Virtual Machine and even use it to generate formatted Java code.
No single programming language is best for every task--which is one reason why there are so many of them--but Rexx is certainly a useful one to have around. It makes a nice complement to syntax-based power languages like Perl, Bash, and Korn. You can become fluent in Rexx in a matter of days, yet you won't run out of power as your knowledge grows.
Here is a list of free software and other resources.
r4 (Windows)roo! (Windows)Howard Fosdick is an independent consultant who has worked with most major scripting languages.
|
Related Reading Classic Shell Scripting |
Return to ONLamp.com.
Copyright © 2009 O'Reilly Media, Inc.