ONLamp.com    
 Published on ONLamp.com (http://www.onlamp.com/)
 See this if you're having trouble printing code examples


Rexx: Power Through Simplicity

by Howard Fosdick
05/26/2005

Here's a quick quiz. Which scripting language ...

  1. Runs on virtually all platforms and operating systems?
  2. Enjoys a strong standard to which all implementations adhere?
  3. Comes in object-oriented versions that are upwardly compatible with the standard procedural language?
  4. Is known for combining ease of use with power?

Through the first three questions you might well have answered "Perl," but question four is a stumper. Powerful and useful as it is, few might claim that Perl is a language that is easy to learn or master.

Rexx is the answer. Rexx was the first widely used scripting language. Though IBM invented it 25 years ago, it may come as a surprise that this language is more popular today than ever. There are now nine free and open source Rexx implementations. These run under virtually any operating system on any platform. All but one meet the Rexx language standard, and each has optimizations or extensions for a specific purpose. For example, two are object-oriented, one integrates with Java, and still others target handhelds, Linux, or Windows.

Rexx's worldwide user community numbers in the hundreds of thousands and is strongest in the new Europe. This community conducts active online forums in many spoken languages and produces tons of free tools and interfaces. That a language of this impact garners almost no recognition in the American Linux open source community is truly amazing.

While the "Rexx story" may be ironic, what really counts is what the language can do for you. That's what this article addresses. Rexx makes a nice complement to languages like Perl, Bash, or Korn because it evolved from an entirely different scripting tradition and features a contrasting personality. By the end of the article, you'll be scripting.

Why Rexx?

Rexx is as easy as PHP or Basic, but powerful enough to run mainframes.

Ease of use and power normally conflict. How does Rexx pull it off? The key is that power does not require complexity. Rexx is free-format and case-insensitive. Use spacing, blank lines, and capitalization to style your code however you prefer. Rexx is virtually devoid of syntax. It eschews special characters and special and default variables. This contradicts the Unix tradition in which scripting languages like Perl, Bash, Korn, and Awk leverage syntactical complexity as the basis for their power.

Rexx automates the data definition and typing tasks that other languages require of developers. Like Tcl/Tk, Rexx variables are all variable-length character strings. You can use string values that represent numbers in calculations; the others are simply strings (character, bit, or hex) that allow string manipulations but cannot participate in calculations. Rexx transparently converts data between "types" as needed. For consistent computational results across platforms, Rexx employs decimal rather than binary arithmetic.

You don't need to declare or predefine your variables. Rexx automatically allocates them when you first refer to them. These principles also apply to Rexx arrays or tables. Arrays don't have predefined, fixed sizes. They expand to the size of available memory, can be sparse or dense, and hold either homogeneous or heterogeneous elements. Rexx relieves programmers of the details of variable and memory management.

Like C, Rexx has a tiny nucleus of two dozen instructions. It derives power from the set of 70 built-in functions that surround this nucleus. You can program immediately with little knowledge of the language, and then expand into broader use of the function set over time. Add-in functions from external libraries or packages are coded just like those built into the language.

Rexx is a glue language. It leverages existing code--operating system commands, programs, services, interfaces, tools, function libraries, widgets, objects, controls, shared libraries, and dynamic link libraries. Rexx can act either as a macro language or an embedded language. It's good at interacting with operating systems, editors, or other programmable systems.

This all adds up to a unique scripting paradigm. Rexx's power comes from simplicity--not from adding on features or complex syntax. Simplicity results in reliable code. It means that programs are understandable, enhanceable, and maintainable. It renders developers productive because it allows them to concentrate on the logic of solutions rather than programming language linguistics. Occasional users don't have to pick up a reference manual to refresh their memories when they script.

Example Scripts

If Rexx is as simple yet as powerful as I claim, you should be able to understand useful scripts almost immediately. Consider the following three examples.

The first is a very easy one. This program accepts a file name as an input argument from the command line, and then reads through that file. It displays on the screen any lines from the input file that contain the phrase PAYMENT_OVERDUE:

        /*******************************************************************/
        /*  Find Payments:                                                 */
        /*     Reads accounts lines one by one, and displays overdue       */
        /*     payments (lines containing the phrase PAYMENT_OVERDUE).     */
        /*******************************************************************/

1       arg filein                              /* Read the input file name*/

2       do while lines(filein) > 0              /* Do while a line to read */

3         input_line = linein(filein)           /* Read an input line      */

4         if pos('PAYMENT_OVERDUE',input_line) > 0 then        
5            say 'Found it:' input_line         /* Write line if $ overdue */

6       end

The script consists of but six lines of executable code. I've numbered them on the left-hand side for easy reference. (The numbers are not part of Rexx, they're just there to facilitate discussion.) Lines enclosed between /* and */ are comments. These can follow executable code, occur on lines of their own, or even span lines (the latter can be useful for quickly decommissioning blocks of code during testing). This script starts with a comment block that explains its purpose.

The arg instruction reads the input file name supplied as an argument on the command line from the program's invocation. The second line uses the lines function to test for the end of file on the input file. As long as there is a line left to read, the do while loop executes. Note that functions in Rexx, whether built-in or external, have immediately following parenthesis enclosing any arguments. If there are no function arguments, you still code the parentheses, as in my_function().

Line 3 reads a line from the input file via the linein function, while line line 4 scans that input line for the search phrase. Its pos function returns the position of the phrase in the search string if it is present, or 0 if not. Line 5 continues the if instruction and writes any line containing the search phrase to the display screen.

The say instruction takes any number of operands, evaluates them, concatenates them, and displays them. Here it has two operands, a literal character string enclosed in quote marks and the input_line variable. The end keyword in line 6 ends the do while statement.

What's remarkable about this script is what it is missing--what Rexx automatically does for the programmer. There is no declaration, opening, or closing of the input file. Rexx automatically opens a file when it's used and closes it when the script ends. There are no variable predeclarations or specifications of their "data types." The program establishes the variable filein by reading a value into it in line 1, while the assignment statement of line 3 creates the variable named input_line. Neither variable has a fixed data type or size.

Look at the power of the lines function in line 2. It tests the end-of-file condition on the input file such that you do not have to code two linein functions in the script to create the top-driven read loop required in structured programming. Just code the one linein function and the code is still structured. Rexx has all of the tools to support structured programming and modularity, including a full set of instructions to implement structured logic, internal and external functions and subroutines, and variable scoping and protection. Modular, structured code reduces errors and produces more reliable, maintainable code. (Rexx is a power language, though, so it includes unstructured instructions and easily allows you to override its many automatic behaviors whenever you want).

Someone might look at this first example program and say, "Rexx is not that powerful. All you did was write a simple filter and it took you six lines of code." Actually, Rexx replaces the function invocations you encode with their return values. So if your goal is to accomplish the most work in the smallest number of lines, you could nest the functions in the example as deeply as you like to reduce the number of lines in the script. I chose not to. Deeply nested, "fortune-cookie" programming may be powerful, but it is also unreadable, unmaintainable, and hard to debug. Rexx supports function nesting to an arbitrary depth, but encourages simplicity. The most powerful scripting language is not the one that solves a problem in the fewest lines of code; it is the language that solves the problem in deft fashion while producing reliable, readable, maintainable code.

Someone else might look at this first example program and say, "You just wrote a Linux or Unix fgrep utility in six lines of code." Well, granted, the script does perform an unnecessary task, but it was the first example. If you prefer the fgrep approach, that's fine. Here's a complete Rexx script that implements it:

arg filein
fgrep  PAYMENT_OVERDUE  filein

The Rexx interpreter parses, analyzes, and executes source lines one by one. When it encounters something it does not recognize as a valid part of the Rexx language--but is still syntactically legitimate--it passes it to the "outside environment" for execution. By default, this outside environment is the operating system's command line. These two lines are a complete Rexx program, the second line of which consists of a single Linux or Unix operating system fgrep command. After evaluating the second statement and substituting in the proper value for the variable filein, Rexx does not recognize this string as a part of the Rexx language, so it passes the command to the operating system and then waits for its completion and continues. (Of course, like all of Rexx's default behaviors, you can override this default process.)

In this simple example, Rexx processes the operating system command and ends, because there is no more to the script. This principle leads to a very powerful place. Rexx scripts easily issue commands to the operating system--or any other external environment, interface, or package. Developers can bring the full power of a string-processing language to bear in dynamically preparing and managing those commands. Scripts can inspect return codes from commands, parse and analyze command outputs and error messages, and intelligently interact and respond to them. Now, that's power!

Content-Addressable Arrays

You've read one Rexx script, so given that the language is easy to learn, you're ready for something interesting. The next script illustrates a Rexx feature common to some of today's more powerful languages: the ability to index arrays or tables by non-numeric subscripts.

This example program includes an array that defines three Chicago-area telephone area codes. It prompts the user to enter the name of a town in the Chicago metro area. Then it retrieves and displays the telephone area code associated with that town. The interesting feature of this script is that the town name is the index into the array the script uses to retrieve the telephone area code. Here's the script:

        /*******************************************************************/
        /* Code Lookup:                                                    */
        /*     Looks up the areacode for the town the user enters.         */
        /*******************************************************************/

1       area. = ''                     /* Initialize array entries to null */

2       area.CHICAGO  = 312            /* Define a table of area codes     */    
3       area.HOMEWOOD = 708   
4       area.EVANSTON = 847   

5       do while town <> ''            /* Loop until user enters null line */

6          say 'For which town do you want the area code?'
7          pull town 

8          if town <> '' then do
9             if area.town = '' 
10               then  say 'Town' town 'is not in my database'
11               else  say 'The area code for' town 'is' area.town
12         end

13      end

The first line in the program defines an array or table named area. You can tell it's an array because it ends with a period. You don't have to declare or define an array prior to using it, but I found it useful because I wanted to initialize all array elements to the null string. The script does this without specifying how many elements the array might contain or the "data types" of those elements. (Rexx arrays can typically grow to the size of available memory.)

Lines 2 through 4 initialize by associating three towns in the array with their respective telephone area code values. Variables that contain internal periods are array variables or compound elements. The interesting thing here is that the array elements have string subscripts (the names of towns) rather than numeric ones. This means that the array is content-addressable. It is a form of associative memory, in that it associates or relates values represented by arbitrary strings.

Line 5 continues the script while the value of variable town is anything other than the null string. The town variable has had no previous use or declaration, so its "uninitialized value" is its own name in upper case (TOWN). After prompting the user in line 6, the script reads a town from the user in line 7. The pull instruction reads one or more variables and automatically translates them into upper case.

If the user declines to enter a town by not entering anything and pressing the Enter key, line 8 identifies the situation, skips the if statement in lines 9 through 11, and exits the do while loop and the program. If the user does enter a town, line 9 looks it up in the area table. Either line 10 displays a message that it is not in the script's "database" (the area array) or line 11 displays the proper area code for the town. In line 11, the compound variable reference area.town is what displays the relevant area code to the user.

There's one other feature to mention in this script. The table lookup works properly because the pull instruction automatically translates the user's input to upper case. Because Rexx considers variable names internally as upper case, the comparison would also have worked if I had coded the array names as area.Chicago, area.chicago, AREA.CHICAGO, or ... you get the idea. Rexx routinely provides this kind of convenient automation, but always offers easy ways to avoid it. For example, to avoid automatic upper-case translation in strings, quote them. To avoid it in reading variable values, use the parse pull instruction instead of pull.

Data Structures

As variable-sized, content-addressable entities, Rexx arrays have a wide range of uses. Here's another example. This script implements the weighted-retrieval algorithm that forms the basis for bibliographic search services in libraries. The idea is that resources like books, magazine articles, and videos have assigned lists of descriptors. The user inputs his own search descriptors along with a weight: the number of his descriptors that must match the resource's descriptors in order for that resource to match his query. The search system retrieves the most relevant resources, as determined by the weight of their matches, and typically displays them in ranked (weighted) order.

For clarity of illustration, I've simplified the algorithm. I hardcoded the search descriptors (or keywords) and only allow the user to input the weight or threshold for retrieval. I also coded the resources (book titles) and their descriptors right into the program, rather than reading them from a database.

Here's the script. It uses two arrays. The first is a list of keywords that describe the retrieval topics. The second is a list of three books, categorized by three descriptors apiece:

        /*********************************************************************/
        /*  Find Books:                                                      */
        /*     This program illustrates how arrays may be of any dimension   */
        /*     in retrieving book titles based on their keyword weightings.  */
        /*********************************************************************/

1       keyword. = ''         /* Initialize both arrays to all null strings  */
2       title.   = ''
     
        /* The array of keywords to search for among the book descriptors    */

3       keyword.1 = 'earth'   ;   keyword.2 = 'computers'
4       keyword.3 = 'life'    ;   keyword.4 = 'environment'

        /* The array of book titles, each having several descriptors         */

5       title.1 = 'Saving Planet Earth'
6          title.1.1 = 'earth' 
7          title.1.2 = 'environment' 
8          title.1.3 = 'life'
9       title.2 = 'Computer Lifeforms'   
10         title.2.1 = 'life'
11         title.2.2 = 'computers'
12         title.2.3 = 'intelligence'
13      title.3 = 'Algorithmic Insanity'
14         title.3.1 = 'computers'
15         title.3.2 = 'algorithms'
16         title.3.3 = 'programming' 

17      arg weight      /* Get number keyword matches required for retrieval */

18      say 'For weight of' weight 'retrieved titles are:'  /* Output header */

19      do j = 1  while title.j <> ''                /* Look at each book    */
20         count = 0
  
21         do k = 1  while keyword.k <> ''           /* Inspect its keywords */
   
22            do l = 1  while title.j.l <> ''        /* Compute its weight   */
23               if  keyword.k = title.j.l  then count = count + 1
24            end
   
25         end

26         if count >= weight then   /* Display titles matching the criteria */
27            say title.j
28      end

The first two lines of the program initialize all positions in both arrays to the null string. Lines 3 and 4 define the elements in the keyword array, while lines 5 through 16 define the three titles and the list of descriptors for each. I've defined all array elements as quotation-delimited character strings. This preserves their case sensitivity. I've placed more than Rexx statement per line in defining the keywords array by using the semicolon to separate the statements. I created a hierarchical or tree structure in the title array merely by using multiple subscripts to describe array elements. Rexx enables subscripting to an arbitrary depth, with any number of subscripts or dimensions, limited only by available memory.

Line 17 reads the weight the user enters from command-line argument provided to the script. Line 18 writes a header for the program's output list of retrieved titles.

The do while loop of lines 19 through 28 process a single book or title. Line 20 initializes the weight or count of descriptor matches for the book title to 0. The loop of lines 21 through 25 processes all of the retrieval keywords against a single book title, while the innermost loop of lines 22 to 24 accumulates the matching weight for an individual book title. Lines 26 and 27 display the books with a number of matches at least equal to the hit count or weight dictated by the user.

Unlike most programming languages, Rexx does not have many built-in data structures. This script shows why. Associative arrays are easy to use to implement data structures of arbitrary complexity. The processing logic in lines 19 through 28 would work without alteration even if I changed the number of keywords or book descriptors, assigned different numbers of descriptors per book, or read the contents of either array from the user or from a file or a database.

The second scripting example demonstrated a lookup table, a data structure embodying the key-value pairs popular in Perl and in the open source Berkeley DB database. This example script demonstrates a list in its keywords table and a tree in its array of titles. This tree happens to be balanced tree, but skewed or unbalanced trees are also possible.

Arrays can also group heterogeneous data items to implement the equivalent of C/C++ structures or Pascal or COBOL record definitions. They can even implement structures requiring symbolic pointers, such as linked lists and doubly linked lists. Rexx arrays may be dense or sparse, contain homogenous or heterogeneous elements, and expand and contract as necessary.

Here is Rexx, a language that "lacks data structures"--and yet permits you to create them without any special syntax. Power comes from ease of use, not from adding features or complexity.

What Next

This article offers just a taste of what Rexx is about. It excludes object-oriented Rexx, which runs standard procedural Rexx programs without alteration, yet fully supports object-oriented scripting.

Open Object Rexx includes classes, methods, messaging, encapsulation, abstraction, multiple inheritance, polymorphism, and a huge hammer of a class library. It retains Rexx's ease of use while providing the full power of object-oriented programming.

I also omitted NetRexx, a "Rexx-like" language that extends Rexx's ease of use into the Java environment. NetRexx runs on both clients and servers. Use it to develop classes, applets, applications, servlets, and beans. NetRexx functions as an interpreter or compiler, so you can run it with or without a Java Virtual Machine and even use it to generate formatted Java code.

No single programming language is best for every task--which is one reason why there are so many of them--but Rexx is certainly a useful one to have around. It makes a nice complement to syntax-based power languages like Perl, Bash, and Korn. You can become fluent in Rexx in a matter of days, yet you won't run out of power as your knowledge grows.

Here is a list of free software and other resources.

Interpreters with Tools

Object-Oriented Rexx

For Handhelds

For Java Integration

Example Scripts

Books

The Rexx Language Forums

International Users Group

Howard Fosdick is an independent consultant who has worked with most major scripting languages.

Classic Shell Scripting

Related Reading

Classic Shell Scripting
Hidden Commands that Unlock the Power of Unix
By Arnold Robbins, Nelson H.F. Beebe

Return to ONLamp.com.

Copyright © 2009 O'Reilly Media, Inc.