In the last two articles, we covered writing output filters for Apache 2.0. In this article, we'll focus on input filters. The two filter types are very similar and many of the concepts that we have already covered with output filters can be successfully applied to input filters. However, input filters differ from output filters enough that it is important to dedicate an article to them.
The first difference between input and output filters is that there are two different types of input filters -- those that filter all content and those that filter only request data. The big difference is that the former will filter headers as well as body data, while the latter only filters body data. The example input filter we'll cover today will modify the headers of each request.
The second difference is how filters are called. There are no functions
analagous to the set of ap_f* functions discussed in the last article. This means that developers that write input filters are forced to handle buckets and bucket brigades directly.
The final difference is the order in which filters effect the data. With the output filters, the content generator started with nothing, generated the base content, and passed that content down the filter stack to be modified; at the bottom of the filter stack, the data is sent to the network and an empty brigade is returned back up the stack.
Input filters work in reverse. The first thing to do in a
function that requests data from the network, is to call the
ap_get_brigade() function with an empty brigade. This
function will pass the brigade to the next filter until the last filter
in the stack receives an empty brigade. The last filter will then fill
out the brigade with the data from the network, and return it to the
previous filter. The previous filter will modify the data and return it until the original
function is returned a full bucket brigade.
The one function we haven't covered yet is ap_get_brigade.
This function calls the next filter in the stack. It has four arguments to
control its behavior:
apr_status_t ap_get_brigade(ap_filter_t *filter, apr_bucket_brigade *bucket, ap_input_mode_t mode, apr_size_t *readbytes);
|
Previously in this series: |
The first argument is the next filter in the stack. The bucket brigade is
the location to use to store the data from the network. Remember that this
argument is always empty when the function is first called, and is filled out
when ap_get_brigade returns. The mode defines how data is read
from the filters. There are three options for this parameter: AP_MODE_BLOCKING,
AP_MODE_NONBLOCKING, and AP_MODE_PEEK. The first two are self-explanatory --
we either read from the network in blocking or non-blocking mode. The third
is more complex. After the first request has been made, Apache needs to
determine if there is another request coming over the same socket. If there
is, then Apache doesn't send the end of a response immediately, it waits until
the second request has fully processed to save network bandwidth. Most
filters can safely ignore this parameter if it is AP_MODE_PEEK, Apache's
core filters will return the correct information along with an empty
brigade. The final parameter, readbytes, is the number of bytes requested
from the network on input, and the number of bytes returned on output. This
is used to inform the requesting function of how much information is available
to be processed. If this value is "0", input filters will return one
line of data.
Now that we have the basics of input filters, let's look at the details. Like
the previous article, there is an example module that
implements the input filter described below. This module was implemented
after the London ApacheCon. At that event, CDs were passed out to conference attendees. The problem was that the CD was created on
Windows, so all the the HTML files used backslashes and spaces in the
URLs, instead of forward slashes and %20. This made the CD unusable for anybody on a non-Windows platform. Because most of the conference attendees
were not using Windows, most people were upset about the CD. To resolve
this problem, I created a simple Apache 2.0 module that filtered the request
to ensure that it is valid.
Now, let's dissect the ApacheCon input filter:
static apr_status_t apcon_filter_in(ap_filter_t *f, apr_bucket_brigade *b, ap_input_mode_t mode, apr_size_t *readbytes)
{
const char *str;
const char *begin;
int length;
apr_bucket *e;
apr_bucket *d;
char data[256];
int i,j;
We start by just declaring all of the variables the filter needs. Each of the variables will become obvious as we proceed through the filter.
ap_get_brigade(f->next, b, mode, readbytes);
I will stress again that the very first call in every input filter should
be to ap_get_brigade. This fills out the brigade to be used by the rest of the filter.
e = APR_BRIGADE_FIRST(b);
if (e->type == NULL) {
return APR_SUCCESS;
}
Once we have a brigade, the first thing we must do is access the first bucket
in the brigade. If the type of this bucket is "null", then the brigade is
empty, and we can just return SUCCESS to the higher filters.
apr_bucket_read(e, &str, &length, 1);
if (strncmp("GET ", str, strlen("GET "))) {
return APR_SUCCESS;
}
apr_bucket_split(e, strlen("GET "));
e = APR_BUCKET_NEXT(e);
At this point, we know that we have a valid brigade, and that there is data
in it. The first thing we must do is to read from the bucket to get a string
of data that we can process. In this case, this filter is very simple and
only knows how to handle GET requests. Once we know that we have a GET
request, we split the bucket so that we are just dealing with the URL and
the HTTP version.
apr_bucket_read(e, &str, &length, 1);
/* this should work, because we are just searching for HTTP/1.0 or HTTP/1.1 */
begin = str + (strlen(str) - 3);
do {
begin--;
} while (strncmp("HTTP", begin, 4) && (begin > str));
apr_bucket_split(e, begin - str - 1);
This segment isolates the URL from the HTTP version. We don't care about the HTTP version, but the filter needs to have the URL isolated from everything else.
apr_bucket_read(e, &str, &length, 1);
i = 0;
j = 0;
while (i < length) {
if (str[i] == ' ') {
data[j++] = '%';
data[j++] = '2';
data[j++] = '0';
i++;
}
else if (str[i] == '\\') {
data[j++] = '/';
i++;
}
else {
data[j++] = str[i++];
}
}
%20 and "\" with "/".
d = apr_bucket_transient_create(data, j);
apr_bucket_setaside(d, f->c->pool);
APR_BUCKET_INSERT_AFTER(e, d);
APR_BUCKET_REMOVE(e);
apr_bucket_destroy(e);
return APR_SUCCESS;
}
|
|
Finally, we have to put the new URL into a bucket, and insert that bucket
into the brigade in the correct location. This filter cheats a bit, because
one of its goals is to be a teaching filter, so we use a transient bucket
and call apr_bucket_setaside immediately. This is done so
that I have a mechanism for teaching about apr_bucket_setaside.
The bucket insertion is done by inserting after the original URL bucket,
and then removing the original.
To try this module, configure your Apache 2.0 server with
--with-module=filters:/path/to/mod_apachecon. This will copy
the module into your Apache source tree and add it to the build system.
This filter is activated automatically and operates on every request.
The easiest way to test it, is to telnet to the server, and make a request
for a file such as:
GET \foo bar HTTP/1.0
Just be sure that you have a file named "foo bar" in your DocumentRoot directory.
Next time, we will discuss how to write modules that can be extended by other modules.
Ryan Bloom is a member of the Apache Software Foundation, and the Vice President of the Apache Portable Run-time project.
Read more Apache 2.0 Basics columns.
Return to the Apache DevCenter.
Copyright © 2009 O'Reilly Media, Inc.