Apache DevCenter
oreilly.comSafari Books Online.Conferences.

advertisement


A Day in the Life of #Apache
Examples of RewriteMap in Action

by Rich Bowen, coauthor of Apache Cookbook
04/28/2005

Editor's note: Rich Bowen is back with another installment in his ongoing series based on conversations on #apache. This week, he provides examples of RewriteMap in action. Rich is a coauthor of O'Reilly's Apache Cookbook.

#apache is an IRC channel that runs on the irc.freenode.net IRC network. To join this channel, you need to install an IRC client (XChat, MIRC, and bitchx are popular clients) and enter the following commands:

/server
irc.freenode.net
/join #apache

Day Twelve

A huge number of the questions on #apache have to do with mod_rewrite. And, fairly frequently, I find myself thinking that the problem being discussed would be so much easier to solve if we could just write a Perl script to deal with it.

Of course, you can, using the RewriteMap, but it's moderately hard to come by good examples of using this, either in the documentation, or elsewhere online.

As some of you may know, I'm working on the documentation, and, hopefully, it will soon contain some good examples of using RewriteMap. But, until then, this article will serve to provide a simple, as well as a not-so-simple, example.

I'll go ahead and give the caveat here, since you'd be really irritated with me if you got to the end and realized this little fact then. Although you can use a rewrite map anywhere (i.e., including .htaccess files), you can only define them in your main Apache configuration file. This has to do with the fact that the map is loaded on server startup, and so putting it in a .htaccess file wouldn't really work.

We'll start with the most simplistic RewriteMap example, so that you can see how the syntax works, and how you'd use it in a simple map scenario. In this simplest form, RewriteMap allows you to create a 1-1 map between patterns and URLs. You can frequently use it to replace a lengthy list of RewriteRules with a map file.

We'll start by creating the map file. We'll call it fish.map and put it in /usr/local/apache/conf, and it will look like this:

carp http://fish.net/carp.pl
trout http://fishermen.com/trout.php
guppy http://guppies.org/about.html
whale http://moby.dick.net/great.white.cfm

In the next step, we create a map name for that file, so that we can use it in RewriteRules.

RewriteMap fishmap txt:/usr/local/apache/conf/fish.map

And, finally, we'll use it in an actual RewriteRule. In this case, we want to redirect some requests for various fishes to sites about those fishes.

RewriteEngine On
RewriteRule ^/fish/(.*) ${fishmap:$1} [R]

Now, when someone visits http://myserver/fish/guppy, they will be redirected to http://guppies.org/about.html instead.

There's still one small problem, though. If they request the URL http://myserver/fish/salmon, the rule will be run, the fish will be looked up in the map, and nothing will match. If we want to provide a default place to go if nothing matches, we can add that to the RewriteRule:

RewriteRule ^/fish/(.*) ${fishmap:$1|http://no.fish.com/} [R] 

Alright, that's pretty simple, you say, but how does this help me if my needs are more complex than a simple 1-1 mapping? Well, that's where the prg: type of RewriteMap comes in. Whereas many rewrite rules can be expressed as a single line of regular expressions, some require several RewriteRule statements in a row, and others just seem to be more complex than one really wants to encode in a Apache configuration file. But you could write it in a few lines of Perl, right?

In fact, in a recent Apache class I taught, one of my students was rather irate that I left RewriteMap to the end. If I'd told them about that first, he said, the rest of it would have been unnecessary. I don't know if I'd go that far, but, let's give a couple small examples to illustrate.

In my first example, I want to replace all dashes (-) with underscores (_) in a URL. Now, you could do this with standard RewriteRule directives, using the [N] flag. But that gets icky, and people tend to get it wrong. However, it's pretty simple in Perl, so let's do it that way instead.

First of all, here's the Perl program that does the transformation. (This gets fired up when Apache starts, so you're not launching Perl with every request, or anything silly like that.)

    #!/usr/bin/perl
    $| = 1; # Turn off buffering
    while (<STDIN>) {
            s/-/_/g; # Replace - with _ globally
            print $_;
    }
    

We turn off buffering in the script because, in many cases, having buffered output can cause the rewriting process to hang indefinitely, waiting for the output to be returned.

We'll put this script in a file named dash2score.pl and put it in /usr/local/apache/conf, as we did with the other map, just for consistency. Make sure to make that script executable. Then we'll give the map a name:

RewriteMap dash2score prg:/usr/local/apache/conf/dash2score.pl 

Now we can use it in a RewriteRule:

RewriteEngine On
RewriteRule (.*-.*) ${dash2score:$1} [PT]

The pattern that I've used--(.*-.*)--will match any requested URL that contains any dash characters, and will cause the entire URL to be passed to the conversion script. The script does the conversion in one step, returns the result, and the RewriteRule passes that resulting URL back to the URL mapping engine to see what happens next.

The more complex example involves database access. I came up with this example when trying to persuade WordPress to give me a particular kind of URL. I should note that, since then, some helpful WordPress developers have pointed out easier ways to do this. However, the technique itself was interesting enough that it inspired me to think of doing this article in the first place. So here it is.

In this case, we're going to look in a database for the information that we want:

    #!/usr/bin/perl
    use DBI;
    $|=1;
    my $dbh = DBI->connect('DBI:mysql:wordpress:dbserver', 
                           'username', 'password');
    my $sth = $dbh->prepare("SELECT ID FROM wp_posts
                            WHERE post_name = ?");
    my $ID;

    # Rewrite permalink style links to article IDs
    while (my $post_name = <STDIN>) {
        chomp $post_name;
        $sth->execute($post_name);
        $sth->bind_columns(\$ID);
        $sth->fetch;

        print "/wordpress/index.php?p=$ID\n";
    } 

We create the rewrite function using the RewriteMap directive:

RewriteMap permalink prg:/usr/local/apache/conf/permalink.pl 

And then we can use it in rewrite rules:

RewriteRule ^/perm/(.*) ${wp_permalinks:$1} [PT] 

In this case, a URL like http://servername/perm/wooga will cause a database lookup using the keyword "wooga."

One final word about how this works, and why it's not monstrously inefficient. The Perl script referred to in the RewriteMap starts when the Apache server is started, and keeps running for the life of the Apache server process. This is why you need a while <STDIN> loop, and that's why it doesn't need to relaunch the program with each request. If the directive were permitted in .htaccess files, it would mean that the program would need to be launched with every request. This would be hugely inefficient.

I hope that this little tutorial will help you use RewriteMap for those cases when the RewriteRules are getting just a little too hairy.

See you on #apache.

Rich Bowen is a member of the Apache Software Foundation, working primarily on the documentation for the Apache Web Server. DrBacchus, Rich's handle on IRC, can be found on the web at www.drbacchus.com/journal.

Apache Cookbook

Related Reading

Apache Cookbook
By Ken Coar, Rich Bowen

Return to the Apache DevCenter.



Sponsored by: