oreilly.comSafari Books Online.Conferences.


Profiling LAMP Applications with Apache's Blackbox Logs
Pages: 1, 2, 3, 4

Working with the Blackbox Format

Now that you're logging the data, you'll need to find a way to put it to work.

The Perl Cookbook featured a field extractor regular expression for the common log file format. That same code can be modified to extract all of the fields in the Blackbox format.


# Our pattern matching format
# A "\s" is used at the end of lines 1 and 2 to catch the space character
# in the format

^([^/]+)/(\\d+)\\ (\\S)\\ \\[([^:]+):(\\d+:\\d+:\\d+)\\ ([^\\]]+)\\]\\s
"(\\S+)\\ (.*?)\\ (\\S+)"\\ (\\d+)/(\\d+)\\ (\\d+)/(\\d+)\\s
(\\d+)/(\\d+)\\ (\\d+)/(\\d+)/(\\d+)\$

# Read a piped log file from standard input

while (<STDIN>)

There are a few things to keep in mind when parsing your log files.

Apache only logs attempted client requests. If someone connects to port 80 and doesn't send any data, it won't be logged. At a minimum, a client needs to send some text and one carriage return. Also, the pattern above assumes a proper request line will be transmitted by the client. If a client only transmits garbage, the regex will fail.

The timestamps refer to the time the request was started, so it is possible to see log file entries that are out of order, especially if you are dealing with a mix of large files and small files being served.

The %T and %D directives refer to the time it takes to handle the entire transaction, including the amount of time it takes for the client to transmit data. If you see wide variances in the amount of time it takes to serve the same file, it may be related to a web client network problem.

Listed below are a few ideas for potential analysis tools that you can develop from the Blackbox log.

Performance Graphing

Long-term collection of Blackbox data is better suited to a graphing environment like RRDTool. You can graph metrics including bytes in and out, maximum clients per second, and child process lifetime.

To do this, you need to write a program that continually scans the Blackbox log in short intervals (5 or 10 minutes), grabs all new entries, and then imports the data into the RRD file.

RRDTool collects time-based sampling data for any duration. One data file can keep data for durations of a week, a month, or even a year. You can merge data files into a single graph if you want to report on the performance of a group of load-balanced servers.

See the RRDTool web site for more information.

Flight Recorder

You have almost enough data in the Blackbox format to see exactly how a single client handled a series of requests. You can extract a full HTTP session by filtering on the remote port and IP.

You could also program a HTTP client to replay the exact same data if you wanted to try and simulate the client actions. The downside is that you won't have a record of all of the client headers passed, such as cookies or authentication data. Plus the timestamp resolution is to the second, which may not be a 100% match with when the original client transmitted the request.

Final Thoughts

Everyone has been logging web server traffic as long as the web has been around, but the emphasis has always been aimed toward the content served, not the server itself. There are tools and modules out there that can monitor performance, but most of them generate reports of data you can already find just by logging it.

The Blackbox format is a simple alternative since it doesn't require additional modules. Further, you can examine the data without a running web server. All the work to get it up and running requires the addition of Apache logging directives and an optional patch to the source code.

Chris Josephes works as a system administrator for Internet Broadcasting.

Return to the Apache DevCenter.

Sponsored by: