ONLamp.com
oreilly.comSafari Books Online.Conferences.

advertisement


A Canary Trap for URI Escaping

by Robert Spier
02/23/2006

When building web applications, one thorny issue is URI escaping and unescaping. This is especially important when passing data between different systems or through multiple redirects. It's possible to end up with double- or triple-escaped URIs, which the application might not handle correctly.

For example, if you pass "Los Angeles" through escaping once, you get "Los%20Angeles". Web applications expect this, so they decode their input. If there is a redirect in the path, however, you may end up with the double-escaped string "Los%2520Angeles". Triple escaping looks like "Los%252520Angeles". Obviously, you wouldn't want to enter one of those into a database ... or use it for output.

From experience, it's always safe to decode query parameters once, because browsers will encode things if necessary. However, what if the data you pass around actually looks like escaped data, or was input for a calculator (say 65%20)? You don't want to overdecode that, because then the calculator will output 5 instead of 20. (That's how Mars Rovers get lost.)

For reference, see RFC 2396 and RFC 2732.

Canary Traps

How many times should you decode a URL? There's no good way to know. To solve this, turn to an old solution used by coal miners. (Or so I've been led to believe.) The story goes something like this.

Coal miners brought a bird, often a yellow canary, into the mine with them. Because the bird has a relatively high respiratory rate and small lung capacity, it is more susceptible to bad air. If the bird fell over, it was a good indication that the air in the mine wasn't safe to breathe, due to a gas leak or perhaps just a lack of oxygen. That was a sign to get out as fast as they could.

A similar technique works in software.

Software Yellow Birds

First, you need some sort of symbol to use as a canary. It should be short, yellow, and something that gets escaped. Because you can't control the color of the text, give up on the yellow thing. I chose =: as the canary. It's short, distinct, and contains two escaped characters (%3D%3A). =: is also very unlikely to be the start of a real data string.

You can use other characters for your canary, but it is easier if you select something that is more likely (or required) to be escaped in transit. If you were to select ZZ as your canary, it wouldn't help you much, because hardly anything will escape those characters. =: will always be escaped, because they are both part of the set of reserved characters in URI strings.

The technique looks something like this. Put the canary on the front of anything you believe might be improperly escaped somewhere along the line. Then, when you receive that parameter back, decode until you see the canary on the front, and then trim it off.

The code might look something like:

my $value = $cgi->param('user_value');
for (my $i=0;  ++$i <= 5;) {  
  last if _remove_canary(\$value);
}

sub _remove_canary {
    my $string = shift;
    return 1 if $$string =~ s/^=://;
    $$string = uri_unescape($$string);
    return 1 if $$string =~ s/^=://;
    return 0;
}

This code needs some data to work on.

my $param = $cgi->param('user_value');

It supports up to five layers of escaping. For each layer, call the _remove_canary subroutine. That routine will return a true value if it has detected the canary, and false otherwise.

The five is arbitrary but reasonable. If you're ever in a circumstance when you have more than five levels of escaping, it probably means something very, very, very, very, very, very bad has happened. What's important is that the number is finite. Otherwise, there is the potential for an infinite loop if someone passes in data without a canary.

for (my $i=0;  ++$i <= 5;) {  
    last if _remove_canary(\$param);
}

The _remove_canary subroutine takes a reference to the original data so that it can modify it in place. This has two benefits. First, you can use the return value of the function to determine when to stop looping. Second, if the data is large, the code won't keep making copies.

sub _remove_canary {
    my $string = shift;

If the code detects the canary at the beginning, remove it and return true.

    return 1 if $$string =~ s/^=://;

The code didn't find the canary, so it unescapes the data and hopes the canary appears.

    $$string = uri_unescape($$string);

Now it checks for the canary again. This isn't strictly necessary, because the next iteration through the loop would catch it. It does save the overhead of one function call. Overoptimization? Maybe.

    return 1 if $$string =~ s/^=://;

Otherwise, it hasn't found the canary, so return false. This isn't an error, but it is the trigger to run through the loop again.

    return 0;
}

Using Your Canary

There are many different ways for reading arguments in CGI and similar scripts (and that would be good content for another article entirely), but you might do something like:

sub check_canary {
    my $param = shift;
    for (my $i=0;  ++$i <= 5;) {  
        last if _remove_canary(\$param);
    }
    return $param;
}

my @wanted_params = qw(color size shape);
my @canary_params = qw(session reference);
my %params;
$params{$_} = $cgi->param($_) 
    for @wanted_params;
$params{$_} = check_canary($cgi->param($_))
    for @wanted_params;

You'll end up with %params containing all of your parameters, unescaped if necessary.

This doesn't remove the need (obligation?) to validate the values. They could still contain unsafe or corrupted values--but they won't be double-, triple-, or quadruple-escaped.

On the output side, things are simple. Just prefix any parameters in the @canary_params list (or your application's equivalent) with =:. There are some things to watch out for:

For example, in Template::Toolkit, don't write:

<INPUT TYPE="HIDDEN" NAME="color" VALUE="=:[% color | uri %]">

The color value will be escaped more than the rest of the string. This can cause you problems later, as the code will not call uri_escape enough times. Instead, make sure always to escape and unescape the canary and entire value together:

<INPUT TYPE="HIDDEN" NAME="color" VALUE="[% "=:" _ color | uri %]">

In summary, canaries are a very simple solution to a very annoying problem. With a few lines of code and a little change to the way you pass data around, you (like the coal miners before you) avoid a very big headache.

Postscript

The inspiration for this article came from a conversation I had with Ask Bjørn Hansen, who is developing the Bitcard single sign-on system. BitCard integrates tightly with Combust, the framework we use at perl.org. Bitcard passes a lot of its data back and forth in the URI via HTTP GET requests. We were having issues with multiple redirections creating double- and triple-escaping conditions. I suggested a canary solution like the one discussed above, and Bitcard has been happy ever after.

Robert Spier is a member of the Site Reliability Engineering group at Google.

Web Site Cookbook

Related Reading

Web Site Cookbook
Solutions & Examples for Building and Administering Your Web Site
By Doug Addison

Return to ONLamp.com.



Sponsored by: