Analyzing Web Logs with AWStatsby Sean Carlos
A crucial, if often overlooked, aspect of running a successful web site is the study of activity occurring within the site. The information gleaned provides valuable input to continuous improvement initiatives, ranging from site architecture and content enhancements to traffic generation. This is the first of a two-part series exploring how to use the open source tool AWStats to perform web server log file analysis. This first part shows how to prepare a sample web log file, perform a basic installation of AWStats, generate reports, and review web analytics terminology; the second part will focus on report interpretation. My aim is to clear away some of the common misconceptions around hits, pages, and visits. The insight will provide a basis for creating a setup to meet production requirements.
Web log analysis can be resource-intensive and usually takes place on a system different from the production web server(s). This separation also allows for the flexibility inherent in heterogeneous architectures, where web servers might be running Linux while log analysis tools run under Windows or vice versa. I've assumed a minimalist scenario in which you have AWStats installed on a desktop workstation for ad hoc analysis. While AWStats will run on any platform that supports a recent Perl interpreter, this article covers AWStats 6.4 using either Linux or Windows.
Binary executables for Linux (.rpm) and Windows (.exe) are available from the AWStats project home page and the AWStats project on SourceForge. Download and run the executable appropriate for your workstation. In the case of a Windows install, a script will prompt you for information about your web environment. Answer
N to skip this step, and press Enter until the command window closes.
Once the installation finishes, you should find the AWStats programs and documentation on your hard drive, likely in /usr/local/awstats/ or C:\Program Files\AWStats\. Now check that Perl is available. From the system command prompt, type:
$ perl -v
You should see version information if you already have Perl installed. AWStats will stop if the version is lower than 5.005_03; the latest version (5.8.x) is recommended, as it offers performance improvements. To install or update Perl, get a version for Linux from Perl for Linux or for Windows from ActiveState's ActivePerl.
Preparing Web Server Log File Data
To produce reports, you need a least a day of web server log file data. If you are using an Apache server, ensure that you have set the web server logging format to Combined. In the case of Microsoft's IIS web server, set your format to a modified version of the W3C Extended Log File Format, following the instructions in AWStats IIS configuration Part B, Step 1. These configurations add necessary data elements such as user agent (browser) and referring site to the base log configuration. For other web servers, consult the AWStats LogFormat parameter values to get a list of data elements required for complete reporting.
Restart the web server for the new logging values to take effect (after saving the old logs, if needed). If you have access to data from a production web server that you cannot restart, you can use the data as is, with two caveats. If you are not logging all the required data elements, such as user agent, the relevant AWStats reports will be empty. In addition, you must manually map each field being logged using the
LogFormat parameter; otherwise, most of your data file will appear as corrupted to AWStats.
Once logging has run for at least a calendar day, copy the log file(s) to the system on which you installed AWStats, using the following target destination, with one of the following:
$ cp /var/log/httpd/access_log /tmp/access.log # or > copy C:\WINDOWS\system32\Logfiles\W3SVC1\ex050623.log C:\temp\access.log
Adjust the origin locations as needed based on your web server configuration.
You can also combine multiple logs from different dates combined using the
type (Windows) or
cat (Linux) utility (in a production setting, turn the filename into a parameter). Be careful to combine the files in chronological order:
$ cat logfile1 logfile2 logfile3 > access.log
In the case of multiple servers in load balancing, merge the logs with the AWStats logresolvemerge.pl utility.
Creating an AWStats Configuration File
A sample AWStats configuration file, awstats.model.conf, comes with the AWStats installation. Copy the file, changing
model to the name of the domain to analyze. While custom dictates the use of a domain name, in reality it can be anything. This example analyzes data from www.antezeta.com, so the model is
$ cp /etc/awstats/awstats.model.conf /etc/awstats/awstats.antezeta.conf > copy "C:\Program Files\AWStats\wwwroot\cgi-bin\awstats.model.conf" \ "C:\Program Files\AWStats\wwwroot\cgi-bin\awstats.antezeta.conf"
Open the resulting file in your favorite text editor. Change each of the following values as necessary (where antezeta.com represents your domain):
SiteDomain="www.antezeta.com" HostAliases="www.antezeta.com localhost 127.0.0.1" LogType=W
Set the parameter
1 for Apache,
2 for Microsoft IIS < 6.0, or
date time cs-method cs-uri-stem cs-username c-ip cs-version cs(User-Agent) cs(Referer) sc-status sc-bytes for IIS 6.x. For other web servers, see the documentation in the configuration file.
Set the parameter
1 unless your web server already performs reverse DNS lookup on hostnames (that is, translating the host IP address 123.456.789.012 to user34.adsl.myisp.com or similar). Because reverse DNS lookup is slow, web servers do not usually perform it, as it would delay user navigation.
Save the file.