ONLamp.com    
 Published on ONLamp.com (http://www.onlamp.com/)
 See this if you're having trouble printing code examples


Apache 2.0 Basics

Apache Modules

09/27/2001

Over the last few articles, I have covered some of the new features in Apache 2.0, and how you can take advantage of them in your web servers. This time, I am going to cover one of the least-discussed features in Apache 2.0.

One of the biggest advantages of Apache over other web servers is how easy it is to write powerful modules. Apache has used modules since version 1.0 to implement everything but serving basic, static files. Because Apache itself used modules for everything, modules need to have access to every stage of serving a request. Modules have, however, always suffered from one major flaw.

Modules have always been solitary entities. If two modules both have to do the same operation, they both need to implement the feature. This makes maintaining modules very difficult because if you modify the behavior in one module, you must also modify it in every other location. Perhaps the best examples of this are the mod_include and mod_cgi modules. mod_include implements server-side include (SSI) processing. mod_cgi spawns CGI scripts. However, mod_include also needs to be able to spawn CGI scripts whenever it encounters the following code in an SSI:

<!--#exec cgi=/cgi-bin/printenv -->

In Apache 1.3, both modules had logic to create a CGI process and execute the correct program in it. This logic was very complex on some platforms -- especially platforms that did not look at the #! line of a script to find the interpreter.

In Apache 2.0, we can remove this maintainance problem by allowing mod_include to call into mod_cgi to create a CGI script. This does have some drawbacks. The first major drawback is that mod_include cannot include the output of CGI scripts unless mod_cgi is loaded into the server.

The Apache developers decided that this was a valid tradeoff because by not including mod_cgi, the administrator has stated that he or she does not want to allow CGI scripts to be run. We could not find a valid reason to allow CGI scripts within SSI files if they weren't allowed to direct connections. This addition to mod_include allows other module authors to easily extend the tags that are allowed in SSI files, without having to modify the base mod_include code.

Also in Apache 2.0 Basics:

Writing Input Filters for Apache 2.0

Writing Apache 2.0 Output Filters

Writing Filters for Apache 2.0

To demonstrate how easy it is to extend mod_include, let's take a look at how mod_cgi does it. The first step is to retrieve a couple functions from mod_include. This is done using optional functions, a new feature in Apache 2.0, that lets a module can register functions with the core as optional. When another module wants to use those functions, it queries the core server, asking if modules those functions have been registered. The core uses the name of the function as the key to find the pointer that was stored when the function was registered.

There are three functions that are important for mod_include extensions: ap_register_include_handler, ap_ssi_get_tag_and_value, and ap_ssi_parse_string. The ap_register_include_handler function is used to specify the tag that this module will handle. ap_ssi_get_tag_and_value gets the next attribute and value from the SSI tag that is being parsed. SSI extension functions will loop calling this function, retreiving each of the attributes until the function returns a "null" attribute. This signifies that all attributes have been found. The final function is ap_ssi_parse_string; this function is used to do variable substitution on a string.

Once all these function pointers are retrieved, your module must register a tag with mod_include. This can only be done if all the functions were retrieved successfully. To register a tag, just call the register function with the string that mod_include should look for when processing SSI output, and a function to call when the string is found. The following function is the function that mod_cgi uses to do this.

static void cgi_post_config(apr_pool_t *p, apr_pool_t *plog, apr_pool_t *ptemp, server_rec *s)
{
    cgi_pfn_reg_with_ssi = APR_RETRIEVE_OPTIONAL_FN(ap_register_include_handler)
;
   cgi_pfn_gtv           = APR_RETRIEVE_OPTIONAL_FN(ap_ssi_get_tag_and_value);
    cgi_pfn_ps           = APR_RETRIEVE_OPTIONAL_FN(ap_ssi_parse_string);

    if ((cgi_pfn_reg_with_ssi) && (cgi_pfn_gtv) && (cgi_pfn_ps)) {
        /* Required by mod_include filter. This is how mod_cgi registers
         *   with mod_include to provide processing of the exec directive.
         */
        cgi_pfn_reg_with_ssi("exec", handle_exec);
    }
}

There is one more important detail. If you look at the previous function's name, you will notice that this is called during the post_config phase. The problem is that mod_include uses a hash table to store the string/function pairs, and it allocates that hash table during its post_config function. If the mod_cgi function is called first, then the server will "seg fault" during startup. This is easily solved using the new hook mechanism in Apache 2.0. Below is an example this. This is the mod_cgi module's register_hooks function. When mod_cgi registers its post_config function, it must specify that mod_include should run first.

static void register_hooks(apr_pool_t *p)
{
  static const char * const aszPre[] = { "mod_include.c", NULL };
  ap_hook_handler(cgi_handler, NULL, NULL, APR_HOOK_MIDDLE);
  ap_hook_post_config(cgi_post_config, aszPre, NULL, APR_HOOK_REALLY_FIRST);
}

Writing Apache Modules with Perl and CWriting Apache Modules with Perl and C
By Lincoln Stein & Doug MacEachern
1st Edition March 1999
1-56592-567-X, Order Number: 567X
743 pages, $39.95

Of course, this is just one example of where modules extending modules are useful. mod_log_config is another place where this ability has been added to the server. In the case of mod_log_config, we now allow modules to extend the type of information that can be logged. As more people implement Apache 2.0 modules, more situations like this will be discovered, and the core team will continue to create opportunities for modules to help each other solve problems.

Apache 2.0 has many features that were not possible in Apache 1.3. These features will help to keep Apache the most popular web server on the Internet. As this is my last article for OnLAMP.com, I hope that in the last six months I have taught you something about Apache 2.0. And I hope that you will download it and experiment to see where it can provide a better solution than Apache 1.3.

Ryan Bloom is a member of the Apache Software Foundation, and the Vice President of the Apache Portable Run-time project.


Read more Apache 2.0 Basics columns.

Return to the Apache DevCenter.

Copyright © 2009 O'Reilly Media, Inc.