When the Apache developers first began talking about Apache 2.0, one of the major goals was for one module to be able to modify the output of another. This goal was realized earlier this year with the sixth alpha version. The mechanism used to make modifications are called filters. Originally it was difficult to write filters, but during the past few releases, the developers have improved the interface so that filters are much easier to create.
This article will cover some of the basic concepts of Apache filters. In my next column, I'll walk you through creating a filter. In the column after that, I will apply the same concepts toward writing an input filter.
Filters work because the Apache developers consider Web pages as chunks of information. In general, we don't care what those chunks look like or how they are stored on the server. In Apache filter terminology, each chunk is stored in a bucket, and lists of buckets form brigades. Lists of brigades can then create a Web document. Filters operate on one brigade at a time, and are called upon repeatedly until the entire document has been processed. This allows the server to stream information to the client.
The basic Apache distribution includes several standard filters.
The first is the
content_length_filter. This filter computes the content length of the response if possible. If the response is not fully available when this filter is first called and the protocol allows the server to send the response without a content-length header, then this filter just passes data to the next filter. It continues to count bytes, however, for logging purposes.
The second standard filter is the header_filter. The first time this filter is called, it formats the header table and sends all of the headers to next filter before sending the current page. This is important, because if your filter wants to modify headers, it must be inserted before the header_filter and it must buffer the entire page until it has made all of the modifications to the headers. Once your filter passes data to the next filter in the stack, you have effectively told Apache that you are done with that data, and it can be sent to the client.
The final filter is always the
core_output_filter. This filter is responsible for writing all data to the network. To provide optimal usage of the available network bandwidth, Apache will buffer as much as 9KB of data before sending it to the client. However, filters can force Apache to send data immediately by flushing the current filter stack.
Before a filter can be enabled for a given request, it must be registered with the server. This is done using the
ap_register_output_filter function. This function is invoked with three arguments: the filter name, the filter function pointer and the filter type, such as:
ap_register_output_filter("CONTENT_LENGTH", ap_content_length_filter, AP_FTYPE_HTTP_HEADER);
Do you have questions on installing Apache 2.0, or on its use of filters?
The filter name is a server-wide unique identifier for this filter. No two filters can use the same string as their
filter_name. For this reason, it is recommended that filter names have some sort of namespace protection unique to each module. The filter function is the function that should be added to the filter stack whenever this filter is specified. Next month, we will cover this function in more detail. Finally, a filter type must be specified. All filters have a type associated with them; this helps Apache to order filters correctly. The following is a list of filter types with their associated meanings.
This filter type specifies that the filter will be used to modify the content of the Web page itself. Examples of this type of filter are SSI or PHP.
This is a special filter type to give modules that want to modify headers an opportunity. All filters of this type are run after all
AP_FTYPE_CONTENT filters are run. Examples of this filter type are the
This filter type represents filters that will modify how a response is sent to the client, but not the content itself. An example of this type of filter is the
chunking_filter, which breaks a response into chunks for the client to interpret. All filters of this type are run after
This filter type is used to modify how the server interprets HTTP data. These filters should not be used to modify the data in the request or response itself, because these filters are called after
AP_FTYPE_TRANSCODE filters, so by this time the server has already created the headers for the request or response. An example of this type of filter is the
http_in filter, which parses multiple requests on the same connection into individual requests for processing by the server.
This is the final filter type, and it is always the last filter type to run. This filter type is responsible for reading and writing data to and from the network.
Most filter writers will focus exclusively on
AP_FTYPE_CONTENT filters. Once a filter is registered with the server, it can be added for a request. This is done using the
ap_add_output_filter function, and is usually specified with the
SetFilter directive in the
httpd.conf file. The
ap_add_output_filter accepts four arguments:
ap_add_output_filter(const char *name, void *ctx, request_rec *r, conn_rec *c);
The first argument is the name that was registered with
ctx argument is an arbitrary pointer that is passed to the filter each time that it is called. This is useful when a single function implements multiple function. The final two arguments are a
conn_rec that the filter uses each time it is called. If a
request_rec is not available, that field can safely be NULL. If the
request_rec is NULL, the
conn_rec must be provided. This allows a single filter chain to be used on both a request and sub-request, without requiring Apache to determine which request goes with which filter. Associating a request with a filter is done when adding the filter to the filter stack.
This article has just barely scratched the surface of filters, and we will take the next two months to delve into this topic. Writing filters is a complex topic, but by taking it slow, they can become a powerful way to enhance a Web server.
Ryan Bloom is a member of the Apache Software Foundation, and the Vice President of the Apache Portable Run-time project.
Read more Apache 2.0 Basics columns.
Return to the Apache DevCenter.
Copyright © 2009 O'Reilly Media, Inc.