ONLamp.com    
 Published on ONLamp.com (http://www.onlamp.com/)
 See this if you're having trouble printing code examples


Caching Dynamic Content with Apache httpd

by Rich Bowen
11/16/2006

Twenty Things You Didn't Know You Could Do with Your Apache Web Server Caching

You know that part of your website that you never update? Sure, it's "dynamic"--the content rests in a database and gets loaded with every request--but you haven't updated it since last Christmas. Yet every time someone loads that page, it hits the database, and it's slow.

One thing you may not have known about your Apache web server is that it can fix that problem for you. Content that never changes shouldn't require CPU cycles to generate.

mod_cache has been around for a while, but there are some new features in it that can help you make better use of your server's resources. Although people have traditionally used the caching capabilities of mod_cache with proxied content, that isn't its only feature. You can also cache your dynamic content and serve it as rapidly as on-disk files.

Modules and Prerequisites

The examples shown here rely on mod_cache, mod_disk_cache, and mod_mem_cache. Make sure that you have those modules installed before trying to do these things.

You can tell what modules you have loaded by running the httpd binary with the -M flag:

$ /usr/local/apache2/bin/httpd -M

As usual, note that the path to the httpd binary may be different on your system. If you installed from a third-party package, the name of the binary may be something different, such as apache2.

If this doesn't show the modules specified, install them.

The examples given here need to go in your main server configuration file(s), not in .htaccess files. If you're not the server administrator, and only have access to .htaccess files, these techniques will not work for you. Sorry. Contact your system administrator.

Finally, the modules and techniques covered in this article are available in Apache 2.0 and 2.2. They aren't available in Apache 1.3. If you're still running Apache 1.3, you really should consider upgrading. As of this writing, 2.2 is the latest and most recommended version of the Apache web server. 1.3 is purely in maintenance mode, with bug and security fixes applied but no new development occurring.

Configuration Examples

Here's what to put in your configuration file for the first recipe:

    CacheEnable disk /
    CacheRoot /var/www/cache
    CacheDefaultExpire 3600
    CacheMinExpire 3600

This configuration will cache all of the content on your website, and the cache will persist for at least an hour, regardless of its "freshness."

This technique is most appropriate for dynamic content that you want to cache and serve as static content.

Here's what the recipe does, line by line:

    CacheEnable disk /

The first line indicates what type of caching to do. This recipe uses mod_disk_cache to cache the content to the file system.

The second argument indicates what portion of your website you want to cache. This case tells mod_cache to cache all content, from the root of the site down. The argument is a URI, not a file path.

Of course, there's some of your website that you might not want to cache. I'll show you how to do that in a moment. On the other hand, if there's only one particular part of your website that you do want to cache, you can specify that here:

    CacheEnable disk /blog

The next line specifies where to store the cached content:

    CacheRoot /var/www/cache

The directory /var/www/cache (or whatever you set the CacheRoot to) needs to exist, and the Apache user must be able to write to it--that is, the user configured in the User directive in your server configuration file.

    CacheDefaultExpire 3600
    CacheMinExpire 3600

These two directives control how long to store resources in the cache before retrieving them again from the live source. The CacheDefaultExpire directive sets an expiration time for resources that don't specify an expiry date or a last-modified date. More importantly, CacheMinExpire specifies the minimum expiration time for resources, even if the resource itself specifies a shorter expiry date.

This is how you can force dynamic content, which would otherwise be served dynamically every time, to be cached for a certain period of time before hitting the database again.

The time specified is in seconds, so this case forces at least an hour of caching, regardless of how "fresh" that content is.

Note that this will almost certainly cause your server to serve stale content occasionally. You'll get phone calls from folks saying "I updated my website, but I'm still seeing the old version." Yeah--that's the point.

Proxied Content

If you run a reverse proxy server--if your Apache server sits in front of some other back-end server, proxying requests to it--mod_cache will cache the content retrieved from those back-end servers, as well as local content.

This is the configuration that folks seem to be most familiar with. Indeed, caching is often most effective in this scenario. When you have a slower legacy back-end server producing some of your content, this setup is useful to give it a little more pep.

Configuration Options

There are a few other things that you can specify about your caching. You can see the full list in the documentation for these modules, but here are a few of the most important ones.

Because you'll probably be caching static content as well as dynamic, you may want to put some limits on what you cache. If a file is too small, there's no performance benefit to caching it. If it's too large, you'll consume your cache space too quickly. You can configure these minimum and maximum sizes to whatever makes sense to you.

     CacheMinFileSize 64
     CacheMaxFileSize 64000

These settings are 1B and 1GB, respectively, by default, so it's worthwhile to set them to something else if you have a large number of very small, or very large, files on your website.

If you know specific details about your file system and how it operates most efficiently, you may wish to tune the CacheDirLevels and CacheDirLength directives in order to create a directory structure that works best for you.

Cache Directory Maintenance

Use the htcacheclean tool to keep your cache directory within certain limits. This will not happen otherwise, and your cache directory will keep growing indefinitely. htcacheclean can run in daemon mode, or you can run it periodically via cron.

    htcacheclean -d -n -t -p /var/www/cache -l 100M -i

With this set of options, htcacheclean will run in the background in daemon mode. It will be respectful of other processes (in nice mode) and will maintain the cache below a maximum of 100MB.

If you need to clear the cache, it's perfectly safe to delete all the files in the cache:

    rm -rf /var/www/cache/*

As always, use extreme caution when using rm -rf because a mistyped argument can result in stuff getting deleted that you didn't intend.

Don't Cache That!

There are some things that you just don't want cached, and there's a simple way to tell mod_cache that.

The most important thing to keep in mind here is that content requiring authentication will never be cached. However, this means content that uses standard HTTP authentication methods--not content that uses your custom homegrown cookie-based authentication scheme. While it's very important not to cache "secure" content, you may need to take additional steps to ensure that, unless you use only standard authentication.

    CacheDisable /secure

If you serve content out of the local file system, there's no gain in caching it:

    CacheDisable /images

Sometimes there are particular parts of the request that you want to avoid proxying--usually one or more of the HTTP headers. For example, you may wish to avoid having cookies end up in the cache:

    CacheIgnoreHeaders Set-Cookie

Memory Caching

So far I've only discussed caching to the file system. Caching to memory is useful if you have a huge amount of memory and an extremely high-traffic site that serves certain files many times per second. Do this by altering the CacheEnable directive from an earlier recipe:

    CacheEnable mem /

You don't need to set a CacheRoot, as documents will not go into the file system. There are multiple configuration settings, which you can find in the documentation for mod_mem_cache.

Other Caching

There are lots of other ways that you can cache and save your CPU for better things. Don't overlook them.

There are plenty of caching engines for use with various programming languages. Notable examples include memcached and APC. These allow you to cache results from common operations, such as calculations, database queries, and compiled code. You should investigate whether your language of choice has such a caching mechanism.

Many database servers have a built-in caching mechanism to serve frequently repeated queries directly from cache. This allows your server to touch the data tables themselves only when the data has changed.

The Squid web cache server is a highly configurable and very fast cache for HTTP content, and may be a better choice than Apache for forward- and reverse-caching in some situations.

The bigger your toolbox, the better chance that you'll know which tool to use in a given scenario.

As usual, the documentation is a great guide for other configuration options. #apache on irc.freenode.net is also full of useful information. Finally, come to ApacheCon!

Rich Bowen is a member of the Apache Software Foundation, working primarily on the documentation for the Apache Web Server. DrBacchus, Rich's handle on IRC, can be found on the web at www.drbacchus.com/journal.


Return to ONLamp.com.

Copyright © 2009 O'Reilly Media, Inc.