Published on (
 See this if you're having trouble printing code examples

Perl 6 Essentials

Building a Parrot Compiler

by Dan Sugalski, coauthor of Perl 6 Essentials

The Background

The company I work (Yarde) for has a large system written in an '80s vintage 4GL called DecisionPlus. This is one of those languages with integrated database, screen handling, and report generation facilities. The current code base is in excess of 460K lines of (often very nasty) code across nearly 1,300 files. Yarde has a full set of source for the system, with new code and bug fixes for the system rolling out into production on a daily basis.

The problem Yarde faces, like many other companies with custom or semi-custom software, is that limits in the system are starting to affect day-to-day operations. In our case, we're running into problems with the integrated database library, which limits some of our critical data files to around 2 million records. That might sound like a lot, but we're putting nearly 5,000 records a day into those files, and that daily rate is constantly increasing. On the one hand, that's good for the company as it means our business is growing, but on the other hand, we can only keep about 400 business days of data online at once. This hasn't affected the business yet, but sooner or later it will.

This is the equivalent of having the full C source to an application, but not having the source to the C compiler or runtime library, and finding that there's a large and unpleasant bug in one of the essential support routines. There was no flaw in the application—for which we had the source—that was causing the problem; rather, the problem was in the runtime, for which we didn't have source.

The common solution to a problem this big, especially when the original system is written in a language that's fallen out of favor, is to rewrite the system in a new language, or to migrate to another software package. These choices are both risky and expensive, and the experiences of other companies show that projects like this fail more often than succeed. We have more than a decade of customized business logic tied up in the current system, which we'd have to tease out and recreate; hundreds of employees we'd need to retrain; and lots of custom reports to recreate. Given the amount of customization needed, it was doubtful that even if we used another off-the-shelf product, we could finish the project for a reasonable cost (or even an unreasonable cost) before the database issues stopped being annoyances and started being problems.

Related Reading

Perl 6 Essentials
By Allison Randal, Dan Sugalski, Leopold Tötsch

Neither rewriting the system nor ditching it for someone else's software was an appealing option, especially given that the reason to do so was essentially an internal issue and mostly invisible to the users. We decided instead to do the sensible thing—we rewrote DecisionPlus. Or, rather, we wrote a compiler that understood DecisionPlus and, rather than targeting the DecisionPlus virtual machine and runtime that was causing us problems, we targeted a newer and less restrictive system.

That may sound mad, but it really was the simplest thing to do, by far. DecisionPlus the language is, well, let's call it charmingly simplistic. The screen handling code was all terminal based and very straightforward to map to ncurses with its form handling library. The database system required a bit of thought, as the original was a fielded ISAM database; we originally looked at the Berkeley DB, but realized we could use an SQL database with some database trickery and runtime library shims, and settled on PostgreSQL. (We needed writable views and triggers, so MySQL wasn't an option.)

The final project design ultimately has four pieces: a compiler, a database runtime library, a screen runtime library, and some general utility code.

We started writing the compiler in Perl 5, to emit Perl 5 code. Writing the compiler in Perl 5 meant we could use a Parse::RecDescent grammar, which takes care of the tokenizing and some of the parsing. This was a huge win. With Parse::RecDescent we got a preliminary compiler written in less than two months. Unfortunately, because of this it became obvious that Perl 5 wasn't a good fit as a target language. DecisionPlus, because of its age, didn't have the concept of subroutines or blocks. Instead, it made do with line labels, a goto, and a gosub statement. It also has typed variables and strings with predeclared (and enforced) lengths.

Perl could do this. The compiler turned labels into subroutines, gotos into goto &subroutines, and gosubs into sub calls. Typed data was implemented with scalars tied into packages that enforced the typed behavior. The result was messy; even before we wrote any runtime library or support code it was pretty clear that the resulting code would be unwieldy and potentially slow.

Writing a compiler still was the right answer. The speed with which I built it made that clear. Perl 5 was just the wrong back end to target. We just needed a different back end. Conveniently, we had Parrot handy, and the results have been entirely satisfactory.

Our compiler started out as a set of prototype single-function programs. There's a simple C-style preprocessor, a parser, and the actual compiler. Splitting out the functions into separate programs made development and testing easier, and helped keep the different functional parts of the compiler from bleeding into each other. This wasn't really necessary. At some point, we may merge the different pieces into a single program.

The first step in the compilation sequence is preprocessing the source. DecisionPlus has no macro or include functionality, but at some point in the past someone figured out that you could use C's preprocessor if you didn't mind ignoring all the warnings and complaints. Since only a small subset of the preprocessor functionality is needed — constant substitution and includes— we wrote our own C-style preprocessor in Perl so we could cut down on the complaints.

Defining the Grammar

After preprocessing, the resulting source is fed into the parser. The parser is again written in Perl, using Damian Conway's Parse::RecDescent module to do the parsing. We'd considered using lex and yacc or some of the other parser generators that are available, but using a perl-based parser made our prototyping much easier and faster. Recursive descent parsers are notoriously slow, as unoptimized grammars trigger a lot of backtracking. There are some tricks you can use to speed up the parsing, and we'll get to those in a bit.

O'Reilly Open Source Convention.

Parse::RecDescent grammars are pretty standard as grammars go, and the examples you'll find in most any compiler or parsing book will work just fine. (Though you may have to do a bit of translation for LALR grammars, like the ones fed to yacc) Each rule in a grammar has a set of one or more strings, regular expressions, or sub-rules that must be matched in the input stream for the rule to successfully match.

Parse::RecDescent has many handy features you can, for example, have pieces of code that fire off whenever a rule matches, allowing your parser to alter Parse::RecDescent's default tree. By default the tree that's built is based on hashes. I found that this wasn't as easy to process as a more traditional tree, but P::RDs documentation notes that if the $::RD_AUTOACTION variable is set to q{[@items]} you get back an array of arrays instead. I found this easier to process in the compiler, though that is a matter of taste.

Since having an example is useful, we'll use the grammar in Example 1 throughout the article. This is a grammar specifically for a recursive descent parser (and will work fine with Parse::RecDescent). If you've never written a grammar before, or are only familiar with LALR grammars such as the ones that yacc uses, it may seem a little odd.

Example 1. the example mathematical grammar

field: /\b\w+\b/

stringconstant: /'[^']*'/ | /"[^"]*"/

float: /[+-]?(?=\d|\.\d)\d*(\.\d*)?([Ee]([+-]?\d+))?/

constant: float | stringconstant

addop: '+' | '-'

mulop: '*' | '/'

modop: '%'

cmpop: '<>' | '>='| '<=' | '<' | '>' | '='

logop: 'and' | 'or'

parenexpr: '(' expr ')'

simplevalue: parenexpr | constant | field

modval: <leftop: simplevalue modop simplevalue>

mulval: <leftop: modval mulop modval>

addval: <leftop: mulval addop mulval>

cmpval: <leftop: addval cmpop addval>

logval: <leftop: cmpval logop cmpval>

expr: logval

declare: declare field

assign: field = expr

print: print expr

statement: assign | print | declare

In this case the grammar is for a mathematical or logical expression, with standard precedence. (That is, modulus comes first, then multiplication and division, then addition and subtraction, then comparisons, then logical operations.) We also have a variable declaration rule, a value assignment rule, and a print rule. There's a separate rule for each level of precedenceadding in more levels is just a matter of adding in more rules. We've also added a variable declaration rule, an assignment rule, and a printing rule so we can hold temporary values and see what we've produced.

If you've used yacc before, you might wonder why we go to all this trouble. This is because an LL, or recursive descent, grammar can't be left-recursive. The leftmost term of a rule can't be the rule itself, as this would lead to infinite recursion when evaluating the grammar. LR and LALR grammars, such as those that yacc uses, don't have this limitation, making precedence rules much simpler.

When this grammar is run over an expression, it will build up a tree that represents the expression. We can feed that tree to our compiler to produce code. In our case we're going to produce Parrot code, but you could easily use this to generate C, Perl, or Python code, or even native machine code if you were so inclined.

Optimizing the Grammar

Recursive descent parsers do a left-most, depth-first match of the grammar against the input source. While this is a powerful technique, it's also potentially very slow — there's a lot more backtracking with a recursive descent parser than with a LALR parser. Because of this, it's a good idea to optimize your grammars, otherwise you'll find your parser too slow, regardless of the language it's written in, to be useful.

Optimizing recursive descent grammars deserves an article (or more) all by itself, but there are a few easy tricks you can use to get a respectable speedup in parse times. The easiest way to make a recursive descent parser faster is to make it do less work. Since rule failures (that is, rules that don't match) make up a significant amount of the work the parser does, make sure the parser fails as rarely as possible, or, rather, succeeds as often as possible.

For example, if you have a grammar with the following keywords:

abort: abort

else: else

if: if

print: print

read: read

then: then

It's useful to have a grammar element that encompasses them all. As a first try, you'd probably throw something together like:

keyword: abort | else | if | print | read | then

and that will work. It'll also be slower than it has to be.

Why? Well, that's due to the way that recursive descent parsers work. When the parser is looking to see if the keyword rule matches, it tries each sub-rule in turn. In this case the rules are listed alphabetically, so it tries abort first, then else, then if, and so on. Programming languages are no different than any other language. Some words are more common than others. In this case, it's likely that abort is the least common of the words in your program, so you're guaranteed that the keyword rule will fail the first test most of the time. The rule will still work, just not as fast as if you had the subrules ordered by frequency of occurrence. Performing a simple frequency analysis on a representative sample of what you're parsing will give you the information you need to order the subrules more efficiently.

Straight frequency sorting works when each subrule costs about the same amount to evaluate. When your subrules are each constant strings they each take about the same amount of time to evaluate. If you have some subrules that are significantly more expensive — because you've got a complex regular expression, or have to execute code to check if a subrule matched — you may find it better to reorder the subrules based both on frequency and expense. Often this is a matter of trial and error, but proper ordering of subrules can cut parsing time by half or more, depending on how many subrules you have and the frequency distribution of your keywords.

Generating the Code

Once you've transformed your source program into a parsed tree, you need to then turn that tree into executable code. In our case, we generate PIR, which is Parrot's Intermediate Representation, a sort of high-level assembly language. While we could have generated the assembly directly, PIR has some features that are nice for compiler writers, including automatic register allocation, subroutine call handling, and subroutine preamble and postamble handling. If you've never had to deal with any of these things count yourself lucky. If you have, you'll be happy you won't have to do so now.

Our example language is pretty simple. We can declare variables, use them in expressions, store numbers and strings in them, and print them out. There are no loops, decisions, blocks, subroutines, or flow control words. While that's not sufficient for a real language, it's enough for example purposes.

Sample Program

Since we need a source program to go with our grammar, let's use the following program:

declare foo
declare bar

foo = 15
bar = (foo + 8) * 32 7

print bar
print \n
print foo % 3
print \n

Once again, nothing at all fancy. The output should be 729 and 5, each on separate lines. Each line is a single statement, so we don't have to worry about statements spanning lines, multiple statements on a line, or other unpleasant complications. This is a good thing, at least for example purposes.

Source Code

Download the example file,

The basic parser and compiler is in Listing 1. It integrates the grammar we detailed earlier with the code necessary to handle the output of the parser and generate PIR code to feed into Parrot. When this program runs, it parses our source and, if there are no errors, it feeds the results into the compilation code, which produces a file of PIR code. From there you can feed the PIR to Parrot and see it run. This compiler fully parses the source before handing the results off for code generation, but you can easily intersperse parsing and code generation if you want.

Basic Parser Behavior

The way we have Parse::RecDescent set up, it generates a tree for each statement. This is a series of arrays of arrays (of arrays of arrays, and so forth) with each matching rule producing an array whose elements are the rule name and the matching sub-rule elements. If what matched for a rule is another rule the element will be an array reference, while if the rule matches one or more strings and regexes, the elements will be string constants.

For example, consider the statement

declare foo

That matches the declare rule, which has two elements, the string constant declare and a field expression. The field expression itself is just a regex that matches a word, at least as Perl defines it. That means the result is:

['declare', 'declare', ['field', 'foo'] ]

The inner rule for the field produces an array of two elements, the rule name (field) and the string that matched the regex (foo). The outer rule produces a result with three elements, the rule name (declare), the string declare, and the result of the field subrule.

The parser will, when it's done, hand the code generation routine a tree for each statement that it parses. Since each node and subnode is tagged with the rule that it matches, the easiest way to process this tree is with a table of functions for the rules. For our grammar, the table would look like:

my (%handlers) = (
    addval         => \&handle_generic_val,
    assign         => \&handle_assign,
    cmpval         => \&handle_generic_val,
    constant       => \&delegate,
    declare        => \&handle_declare,
    expr           => \&delegate,
    field          => \&handle_field,
    float          => \&handle_float,
    logval         => \&handle_generic_val,
    modval         => \&handle_generic_val,
    mulval         => \&handle_generic_val,
    parenexpr      => \&handle_paren_expr,
    print          => \&handle_print,
    simplevalue    => \&delegate,
    statement      => \&delegate,
    stringconstant => \&handle_stringconstant,

Using a table like this makes the compiler easily extendable — if you add in a new node you just add in a new entry in the table. You'll also note that some of the rules share functions. This is fine, and we'll get into why in a little bit.

Emitting Header Information

Before we start processing the output of the parser, we need to put out some boilerplate header information. This is a common thing for compilers to do, and you'll find yourself with a small catalog of boilerplate code. In this case, the boilerplate looks like:

.sub __MAIN prototyped, @MAIN
.param pmc argv

This declares that we're in a subroutine named __MAIN, which is prototyped, and takes a single PMC parameter argv. Being in a subroutine is mandatory; Parrot requires that all code executed must be in a subroutine, function, or method. For us, with our simple sub-less language, the name is irrelevant, but in languages with real subs and methods the name is often meaningful. The @MAIN tag on the subroutine marks it as the starting subroutine for our program. When parrot executes our program, this is where it starts.

At the moment, Parrot assumes that the first subroutine in the first file that's executed is the driving sub, though we may change that later on.

The parameter, argv, goes unused in our example language, but it's an array PMC that holds the command-line parameters passed into parrot when our program started.

Emitting PIR Code

After the boilerplate is emitted, we can start processing the trees of nodes the parser has generated. If we mandate that each node-handling function returns the PIR code needed to process that node, the code driving code generation can look like:

foreach my $node (@nodes) {
    my (@lines) = process_node(@$node);
    print join("", @lines);

In this case, we assume that the @nodes array holds all the statement nodes from our parsed program. We flatten and process each node in turn, printing out all the code for the node. The function to process a node looks like:

sub process_node {
    my (@elems) = @_;

    if(exists($handlers{$elems[0]})) {
        return $handlers{$elems[0]}->(@elems);
    } else {
        return "***", $elems[0], "***\n";

This function (the version in the listing has some extra error checking) just looks to see if it can find a function in our global handler hash to handle the node that's been passed in. If it can, it calls that function, passing in the flattened node data. If it can't, it emits an easy-to-find string that notes the node that it couldn't find a handler for. This comes in handy when debugging the compiler later on, as it makes it easy to find where you've found a token you don't know how to deal with.

With the driver code and the node processing code, it's just a matter of writing the code for the individual nodes. This code can be complex, so it's not necessarily as simple as it might seem, but its not that tough. It is important to remember, when writing a compiler, that you need to evaluate things from the inside out. When evaluating the node for print, for example, you have to evaluate the expression to be printed before doing the actual print. This implies that all nodes that get evaluated return a value that can be used, as well as the code to be executed to produce this value. Since we're already having the nodes return the code needed to produce that value, we need an alternate way to get the value produced by a node.

The following code in the compiler does this for us:

sub last_expr_val {
    return $::last_expr;

sub set_last_expr_val {
    $::last_expr = $_[0];

We use a global variable, which is a perfectly fine thing to do in this case, but we hide it behind a pair of accessing functions. This is less a stylistic convention (as you'll see, we'll introduce more globals later — compilers seem to accumulate them) as one of convenience. Using accessor functions means we can easily wedge more code in later if necessary.

There are three types of nodes that we'll need to deal with in our compiler. The first type is the terminal node. These are nodes that resolve to a single concrete thing. The float, field, and stringconstant rules produce terminal nodes, as what they generate is a single concrete thing. (In this case a floating point constant, field name, or string constant) The handlers for nodes like these are generally simple. For example, the handler for a floating point constant is just:

sub handle_float {
    my ($nodetype, $floatval) = @_;

These nodes don't even produce any executable code, since theres no code that actually needs to be executed for them. It's sometimes useful to put validation code in a terminal node, if there's something you want to check at code generation time. For example, the field node handler may look like:

sub handle_field {
    my ($nodetype, $fieldname) = @_;

    if (!exists $global_vars{$fieldname}) {
        die "undeclared field $fieldname used";


if you want the compiler to emit a fatal error if a variable has been accessed before it has been declared. The field access code could also fetch the fields out of the global namespace if you were using it. (Our compiler uses the named local variable feature of Parrot's PIR compiler, so we don't do this.)

The next type of node you'll deal with is the non-terminal node. These are nodes generated by rules that have multiple subrules, pieces of text, or regular expressions that match. The declare rule is a good example of this, as it has both the constant text declare that matches as well as the field rule to note the name of the field being declared. The declare handler looks like:

sub handle_declare {
    my ($nodetype, undef, $var) = @_;
    my @lines;
    my $varname = $var->[1];

    # Does it exist?
    if (defined $global_vars{$varname}) {
        die "Multiple declaration of $varname";

    push @lines, " .local pmc $varname\n";
    push @lines, " new $varname, .PerlInt\n";
    return @lines;

This handler illustrates an important feature of compilers — they cheat, sometimes extraordinarily. In this case all we're doing is peeking at into the insides of what the field name subrule produced, rather than just delegating to the field handler. Why? Well, if we delegated to the field handler it would complain that the field hasn't yet been declared. While that's true, it's not a useful error as we're in the field declaration rule. So we cheat and peek ahead a bit.

This particular cheat isn't strictly necessary, so if you object to it you can get rid of it. There are several ways to do that, whether making a separate pass over the tree to note the declarations before making the compilation pass, tweaking the parser to make entries into the global variable list as it parses, using a different rule for field names in the declaration rule, or adding in a context marker that the field handler can note. Consider them exercises for the reader if you're so inclined.

The third type of node you'll encounter is a node that has no purpose other than to delegate to another node. The constant and simplevalue rules are examples of this. While each rule has several alternative subrules, each may match exactly one subrule. That means that the node that's generated for the rule will consist of just a single subrule node, in which case we can just punt the processing to the subrule handler and be done with it. This is done with the delegate handler that's attached to these rules. It looks like:

sub delegate {
    my ($nodetype, $nodeval) = @_;
    return process_node(@$nodeval);

All it does is call the subnode handler and return whatever it returns.

Now, you may wonder why we even bother with nodes like this. The simple reason is partly sheer laziness, as we could make a pass over the tree first and splice out nodes like this. Making that pass over the tree does take some time, though, so it makes the compiler a bit faster if we don't bother. Leaving the tree alone also leaves us open to some flexibility in the future. We may change the grammar a bit and make these nodes not just passthrough nodes.

Most of the terminal and nonterminal nodes are simple enough to process, but the one place you'll probably run into the most trouble is in processing expressions. The grammar does proper precedence-based and parenthesized grouping of expression terms, but processing the expression can still be somewhat tricky. You've probably noticed that the five different expression term rules are essentially identical. That's because they are. The only reason they're split into separate rule is that's what needs to be done with top-down parsers to properly handle precedence. (As we've said, bottom-up parsers, like ones that yacc generates, don't have this issue)

Complex Handlers

All our expression rules can be handled by a single handler, and they are. The handler is a bit complex, so we'll look at only a part of it:

sub handle_generic_val {
    my (undef, $terms) = @_;
    my (@terms)        = @$terms;
    my $lhs            = shift @terms;
    my @tokens;

    push @tokens, process_node(@$lhs);

    my ($op, $rhs);

    while (@terms) {
        $op      = shift @terms;
        $rhs     = shift @terms;
        my $val  = last_expr_val();
        my $oper = $op->[1];

        push @tokens, process_node(@$rhs);

        my $other_val = last_expr_val();
        my $dest      = $temps{P}++;

        foreach ($oper) {
            # Simple stuff -- addition, subtraction, multiplication,
            # division, and modulus. Just a quick imcc transform
            /(\+|\-|\*|\/|%)/ && do {
                push @tokens, "new \$P$dest, .PerlInt\n";
                push @tokens, "\$P$dest = $val $oper $other_val\n";

            /=/ && do {
                my $label = "eqcheck$tempcount";
                push @tokens, "new \$P$dest, .Integer\n";
                push @tokens, "\$P$dest = 1\n";
                push @tokens, "eq $val, $other_val, $label\n";
                push @tokens, "\$P$dest = 0\n";
                push @tokens, "$label:\n";

    return @tokens;

This code only handles basic math and equality testing, but that's sufficient for illustration. The basic math operations each first create a new temporary variable, perform the requested operation, and then set the last expression value to be the temp we just created.

This takes advantage of the PIR compiler's infinite temporary register feature. While Parrot only has 32 integer, string, floating point, and PMC registers, PIR code can use as many as it needs, with the PIR compiler mapping these temp registers to real registers as it needs to. Temp registers are noted with a leading dollar sign, so while P3 refers to the fourth PMC register, $P3 is the fourth temporary PMC register. The PIR compiler handles calculating temporary lifetimes, spilling excess temps to backing store, and moving them around as needed for expressions. It's a very useful feature, to be sure, and worth using as it takes care of the single most annoying characteristic of a register-based system.

Translating Complex Expressions to PIR

Since Parrot's opcodes only do a single thing, such as add two things and return a result, complex expressions have to be broken into simple pieces and have the result stitched together. In its simplest form the expression a = b + c + d turns into:

$P0 = b + c
$P1 = $P0 + d
a = $P1

though Parrot does need you to explicitly create the temporary PMCs that are in $P0 and $P1, since math operations assign their result into existing PMCs, rather than creating new PMCs to assign the result into. Our compiler could be clever and see that $P1 is unnecessary, as it only gets used in the assignment that immediately follows, but doing so requires extra work analyzing the results of expressions. You could, if you wanted, add an optimizing pass to detect this and emit the more efficient code.

The equality testing rule is a bit trickier, since it needs to do some flow control. While Parrot can do all sorts of comparison testing, all the comparisons are test-and-branch operations, unlike the math operations that just produce a value. That means we need to do a bit of bouncing around to figure out what the results should be. In pseudo-code, it looks like:

temp = true
if equal goto templabel
temp = false


What we do is create a temporary variable and set its value to true, which for our compiler means 1. Then we test the result of the equality expression, and if it is true, then we bounce to the temporary label that marks the end of the expression. If our expression is false we don't branch, but instead fall through to the next statement, which sets our temporary variable to false.

Emitting Postamble Code

Finishing off our compiler is just a matter of filling in the code for the rest of the nodes, and emitting a bit of postamble code. (The example compiler provides that code, so feel free to use it) For us, the postamble is:


the end tells parrot to exit, and the .end marks the end of the subroutine. When writing parrot code you must explicitly exit you can't just fall off the end of the world, as Parrot will generally crash. (Parrot assumes that compilers emit correct code, which includes properly exiting.)

Running the Program

The provided compiler for our example language is simple enough — it takes the names of the source and destination files as parameters on the command-line. Assuming the source we're compiling is in a file named example.simple, the command would be:

perl example.simple example.imc

This produces a file named example.imc. The .imc extension is standard for parrot PIR code files, though you could use a .pir extension instead if you'd prefer.

Once you have the PIR code from your source, run it to see what happens. Doing this is straightforward:

parrot example.imc

Parrot will recognize that the input is PIR code by the extension and will compile and run your program. If you want parrot to generate bytecode rather than execute your program, use the -o switch:

parrot -o example.pbc example.imc

The generated bytecode file should have a .pbc extension. To execute the bytecode, just hand it to parrot as you did the PIR code:

parrot example.pbc

Compiling to bytecode isn't necessary for Parrot to run your program, but Parrot will start your program a bit faster if you have the bytecode rather than the source.


That's it for our simple compiler. The example language lacks some features of many useful languages, such as control flow, but the compiler we built is fully functional, and you should now have the tools you need to add in control flow capabilities if you want them.

Dan Sugalski is the chief architect for Parrot, the interpreter engine for Perl 6. He's been a Perl 5 core developer for years, writing more than a dozen modules in the process.

Return to

Copyright © 2009 O'Reilly Media, Inc.