ONLamp.com    
 Published on ONLamp.com (http://www.onlamp.com/)
 See this if you're having trouble printing code examples


Making Apache httpd Logs More Useful

by Rich Bowen
02/01/2007

No doubt you're already aware of the standard logfiles that Apache httpd creates for you. There's the access log, which tells you every time a request is made to your server. There's also the error log, which makes a note every time something goes wrong or something of interest happens that you should know about.

There are a few things that you can do to make your access log more useful, such as using the combined, rather than the common, logfile format--but that's another article. Look at the documentation for mod_log_config for more information on that.

You may not know that there are several additional logging modules that provide information about certain types of things that happen on your server.

The modules discussed here are available in 2.0 and 2.2, but not in 1.3.

mod_logio

When mod_log_config makes a log entry, the number of bytes transfered can be (and usually is) logged using the %b variable in the LogFormat. This number is less useful than you might wish, as it logs the size of the body of the response, and does not include the headers. Because a significant percentage of the data transferred to the client is comprised of headers, this doesn't provide the whole picture of how much data you're transferring. It also doesn't include the request at all, so on a site where file uploads are permitted, you end up seeing only a part of your total bandwidth usage.

mod_logio adds two new variables to those available to the LogFormat directive, which allows you to log the total bytes transferred, including headers, both input and output.

These two variables are %I and %O, which will log the size of the input--the request, including headers and request body--and the output, including all the headers.

For example, you might have a LogFormat directive like:

LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\" %I %O" combinedio

You'll find this line in your default configuration file if you install 2.2 from source. I don't know whether your particular distro included this line for you.

On the end, you'll see the two additional variables that I mentioned. These result in logfile entries that look like:

    192.168.200.105 - - [24/Nov/2006:11:23:30 -0500] "GET / HTTP/1.1" 200
        8054 "http://wooga.drbacchus.com/index.php?" "Mozilla/5.0 (X11; U; Linux
        i686; en-US; rv:1.7.5) Gecko/20041107 Firefox/1.0" 935 8522

This shows a request for the front page of my website, and it shows that the bytes transferred, not including the headers, came to 8054. However, if you include the headers in that figure, it comes to 8522 bytes. There were 935 bytes in the request.

This gives you a much better picture of how much bandwidth your site is actually using and includes what's coming in, as well as what's going out.

mod_log_forensic

mod_log_forensic, added in 2.0, gives a little additional data that may help you troubleshoot problems on your server.

In particular, mod_log_forensic logs a fixed-format logfile that tells you if and when your requests complete. It makes a log entry when the request is initiated, and another when the request is completed.

To start logging, add this directive to your configuration file:

    ForensicLog logs/forensic_log

The initial log entry looks like:

    +5fb1:45671e25:0|GET /wordpress/index.php?feed=rss2category_name=podcasts
    HTTP/1.1|Host:wooga.drbacchus.com|Accept:*/*|Accept-Charset:ISO-8859-1,utf-8;q=0.7,*;q=0.7|Accept-Encoding:gzip,identity|User-Agent:Feedreader
    3.07 (Powered by Newsbrain)

That is, it logs all of the headers of the request and assigns a unique identifier to the log entry. If you're using mod_uniqueid, the mod_uniqueid identifier will be used. In Apache 2.0, mod_uniqueid is required to use mod_log_forensic; In 2.2 and later, it is optional.

When the request is completed, another log entry is made:

    -5fb1:45671e25:0

The value of this logfile is in determining what requests, if any, have not completed. This will tell you which requests time out or are perhaps requests to CGI or PHP files that hang indefinitely.

No, you don't have to look through the file to find these non-matching pairs. A script that comes with Apache--check_forensic--does this for you.

If you installed from source, you won't find it in your bin/ directory; you'll need to get it out of the support/ directory of your source distribution. As usual, I don't know whether your particular distribution of your OS will include this utility in its package releases.

Running check_forensic on your forensic logfile will spit out all of the request log entries that do not have a matching end request entry.

mod_dumpio

When you're trying to figure out what your browser and server are saying to one another, mod_dumpio might be what you need. However, be warned that its output is somewhat cryptic and voluminous, and you need to be pretty sure of what you're looking for.

mod_dumpio logs the entire request (headers and body) and the entire response (headers and body) to the error logfile when enabled.

To log the request, set:

    LogLevel debug
    DumpIOInput On
    DumpIOOutput On

Dumping input and output have different directives so that you can choose to have only one or the other, if, for example, you have an idea which side of the conversation contains the data you're looking for.

Don't enable this on production servers--at least, not for very long--as it will generate enormous logfiles in a very short time, and will also slow down performance considerably.

At least in Apache 2.0 and 2.2, you have to turn the LogLevel up to debug for anything to be logged. Apache 2.3 will contain a new directive, DumpIOLogLevel, which sets the LogLevel at which mod_dumpio will dump data to the log.

RewriteLog

Finally, here's something that might be immediately practical to you. The RewriteLog is your very best tool when trying to subdue recalcitrant RewriteRule directives.

There are two directives involved in enabling the RewriteLog:

    RewriteLog logs/rewrite.log
    RewriteLogLevel 9

The important caveat here is that neither one of these directives work in a .htaccess file, so if you're troubleshooting RewriteRules on a hosted website where you don't have access to the main config, I recommend that you instead get it working on your test machine at home before uploading it to the live site. Of course, testing things on a test server is always a good policy.

The RewriteLogLevel directive may be anywhere from 0 (don't log) to 9 (tell me everything). It controls the volume of information that you receive in the log.

RewriteLog entries tell you what argument a RewriteRule or RewriteCond is being applied to, the success or failure of that match, and the action that will be taken, if any, as a consequence of the match.

For example, consider these lines from my rewrite log:

192.168.200.1 - - [30/Nov/2006:21:03:18 --0500]
    [wooga.drbacchus.com/sid#9db7eb8][rid#9ecdee8/initial] (3) applying
    pattern '^/blog/(.*)' to uri '/blog/rewrite'
192.168.200.1 - - [30/Nov/2006:21:03:18 --0500]
    [wooga.drbacchus.com/sid#9db7eb8][rid#9ecdee8/initial] (2) rewrite
    '/blog/rewrite' -> '/index.php?name=rewrite'

The first of these entries, logged at level 3, shows the RewriteRule pattern ^/blog/(.*) being applied to a requested URI of /blog/rewrite. Because it matches, the target of the rewrite rule is invoked, and the URI is rewritten.

Often, the most useful part of the rewrite log is understanding what string the rewrite rule is actually being compared against. This produces many "aha!" moments, as in, "Aha! No wonder it's not matching! It expected a leading slash there!"

Conclusion

As we say on #apache, no matter what the problem is, step one is to look at the error log. When the error log doesn't provide quite enough information, Apache has no shortage of logging abilities that will give you that little bit of extra information you need to find the root of your problems.

Rich Bowen is a member of the Apache Software Foundation, working primarily on the documentation for the Apache Web Server. DrBacchus, Rich's handle on IRC, can be found on the web at www.drbacchus.com/journal.


Return to ONLamp.com.

Copyright © 2009 O'Reilly Media, Inc.