ONLamp.com
oreilly.comSafari Books Online.Conferences.

advertisement


Caching Dynamic Content with Apache httpd
Pages: 1, 2

Proxied Content

If you run a reverse proxy server--if your Apache server sits in front of some other back-end server, proxying requests to it--mod_cache will cache the content retrieved from those back-end servers, as well as local content.



This is the configuration that folks seem to be most familiar with. Indeed, caching is often most effective in this scenario. When you have a slower legacy back-end server producing some of your content, this setup is useful to give it a little more pep.

Configuration Options

There are a few other things that you can specify about your caching. You can see the full list in the documentation for these modules, but here are a few of the most important ones.

Because you'll probably be caching static content as well as dynamic, you may want to put some limits on what you cache. If a file is too small, there's no performance benefit to caching it. If it's too large, you'll consume your cache space too quickly. You can configure these minimum and maximum sizes to whatever makes sense to you.

     CacheMinFileSize 64
     CacheMaxFileSize 64000

These settings are 1B and 1GB, respectively, by default, so it's worthwhile to set them to something else if you have a large number of very small, or very large, files on your website.

If you know specific details about your file system and how it operates most efficiently, you may wish to tune the CacheDirLevels and CacheDirLength directives in order to create a directory structure that works best for you.

Cache Directory Maintenance

Use the htcacheclean tool to keep your cache directory within certain limits. This will not happen otherwise, and your cache directory will keep growing indefinitely. htcacheclean can run in daemon mode, or you can run it periodically via cron.

    htcacheclean -d -n -t -p /var/www/cache -l 100M -i

With this set of options, htcacheclean will run in the background in daemon mode. It will be respectful of other processes (in nice mode) and will maintain the cache below a maximum of 100MB.

If you need to clear the cache, it's perfectly safe to delete all the files in the cache:

    rm -rf /var/www/cache/*

As always, use extreme caution when using rm -rf because a mistyped argument can result in stuff getting deleted that you didn't intend.

Don't Cache That!

There are some things that you just don't want cached, and there's a simple way to tell mod_cache that.

The most important thing to keep in mind here is that content requiring authentication will never be cached. However, this means content that uses standard HTTP authentication methods--not content that uses your custom homegrown cookie-based authentication scheme. While it's very important not to cache "secure" content, you may need to take additional steps to ensure that, unless you use only standard authentication.

    CacheDisable /secure

If you serve content out of the local file system, there's no gain in caching it:

    CacheDisable /images

Sometimes there are particular parts of the request that you want to avoid proxying--usually one or more of the HTTP headers. For example, you may wish to avoid having cookies end up in the cache:

    CacheIgnoreHeaders Set-Cookie

Memory Caching

So far I've only discussed caching to the file system. Caching to memory is useful if you have a huge amount of memory and an extremely high-traffic site that serves certain files many times per second. Do this by altering the CacheEnable directive from an earlier recipe:

    CacheEnable mem /

You don't need to set a CacheRoot, as documents will not go into the file system. There are multiple configuration settings, which you can find in the documentation for mod_mem_cache.

Other Caching

There are lots of other ways that you can cache and save your CPU for better things. Don't overlook them.

There are plenty of caching engines for use with various programming languages. Notable examples include memcached and APC. These allow you to cache results from common operations, such as calculations, database queries, and compiled code. You should investigate whether your language of choice has such a caching mechanism.

Many database servers have a built-in caching mechanism to serve frequently repeated queries directly from cache. This allows your server to touch the data tables themselves only when the data has changed.

The Squid web cache server is a highly configurable and very fast cache for HTTP content, and may be a better choice than Apache for forward- and reverse-caching in some situations.

The bigger your toolbox, the better chance that you'll know which tool to use in a given scenario.

As usual, the documentation is a great guide for other configuration options. #apache on irc.freenode.net is also full of useful information. Finally, come to ApacheCon!

Rich Bowen is a member of the Apache Software Foundation, working primarily on the documentation for the Apache Web Server. DrBacchus, Rich's handle on IRC, can be found on the web at www.drbacchus.com/journal.


Return to ONLamp.com.



Sponsored by: