Part one of this series showed how to perform a basic installation of the web server log analysis tool AWStats, generate sample reports, and understand basic web analytics terminology. This second article delves into the reports, noting metrics worth watching for technical and business staff.
AWStats provides several summary reports that show the overall website trends over different time intervals for the reporting period, usually monthly.
The General Summary report breaks down unique visitors, number of visits, pages, hits and bandwidth by human and non-human visitors for the current reporting interval. Metrics to watch are:
This provides a breakdown of visitors and pages, month by month, for the current year. In month-to-month comparisons, do not forget to adjust for the variable number of days in a month. Keep an eye on the Unique visitors/number of visits and Pages/visitors ratios that you must manually calculate.
Day-to-day and month-to-month trends are useful in evaluating the impact of marketing initiatives and/or external traffic drivers.
AWStats also supports yearly reporting intervals, using the command line
-month=all -year=YYYY date syntax or through the AllowFullYearView configuration option for the on-demand CGI interface. Daily reports are available as an unofficial feature. Changes planned for version 6.5 should facilitate reporting on different time intervals.
Qualified traffic building (and monitoring) is an essential marketing activity for most sites, regardless of business model. On the technical side, changes in traffic patterns are useful for technical capacity planning and management. Traffic comes from:
The relevant AWStats reports are in the "Referers" section. The misspelling is due to a historic mistake.
AWStats reports on visitors where a "refering" URL is missing in the first page request during a visit/session. The Referrers report calls this "Direct address/Bookmarks." Some privacy software, such as Norton Internet Security, can block transfer of referring URL information, meaning that those visitors will appear in this report even if they came from a link on another site or a search engine.
Search engine reports show which search engines and queries brought visitors to the site. The main report contains the top listings; each section has a link to the complete listing for the reporting period.
Search activity information is extremely useful in validating and refining merit-based search engine optimization efforts. Key word phrases identify the language used by site visitors, language that is usually rather colloquial compared to the jargon often prevalent in site copy. Consider revising the site copy to ensure it contains the language used by your target audience while maintaining a professional tone. The absence of keyword synonyms may be more of an indicator that these words are not present in your site's content rather than a lack of the use of these terms by internet users.
While most indirect traffic comes through search engines, traffic can also come from external site inbound links--links due to compelling content, advertising agreements, etc. Monitor inbound links from external sites to:
By default, AWStats shows the specific page that referred to your site. It is also possible to aggregate the referrers by domain, by taking advantage of AWStats' custom report feature. Simply add the following logic to your AWstats configuration file:
ExtraSectionName1="Referring Sites by domain - Top 25" ExtraSectionCodeFilter1="200 304" ExtraSectionFirstColumnTitle1="Site" ExtraSectionFirstColumnValues1="REFERER,^http:\/\/([^\/]+)\/|^HTTP:\/\/([^\/]+)\/" ExtraSectionFirstColumnFormat1="<a href=&aps;http://%s/&aps; title=&aps;http://%s/&aps; target=&aps;_blank&aps;>%s</a>" ExtraSectionStatTypes1=PHBL ExtraSectionAddAverageRow1=1 ExtraSectionAddSumRow1=1 MaxNbOfExtra1=25 MinHitExtra1=1
This section will appear for data after the configuration file has been updated. To retroactively generate this report, you must delete the AWStats statistics files and regenerate them as well, as the reports run from them.
To the extent it's possible to associate a visitor's host name with a physical location, it is possible to report on the geographic provenance. By default, AWStats offers country-level reporting.
Several marketing reports assist in the understanding of how users behave once they have arrived at your site.
Tip: Consider extending AWStats by using custom reports such as ExtraSection to monitor specific site pages and/or directories. The following example, added to your AWStats configuration file, will track the most-visited first- and second-level site directories. For sites that have placed business content in distinct directories, this type of report provides overall performance at a glance.
ExtraSectionName2="Top 50 first and second level directories" ExtraSectionCodeFilter2="200 304" ExtraSectionCondition2="URL,^\/.*" ExtraSectionFirstColumnTitle2="Directory" ExtraSectionFirstColumnValues2="URL,(^(\/[\w]+\/[\w]+\/)|^(\/[\w]+\/))" ExtraSectionStatTypes2=PHB ExtraSectionAddAverageRow2=0 ExtraSectionAddSumRow2=0 MaxNbOfExtra2=50 MinHitExtra2=1
For each line, change the
1= if you do not already have an
ExtraSection enabled. In addition to the second example here, there are six examples in the AWStats online documentation topic "ExtraSection," and additional samples in the AWStats web analytics resource center.
Several technical reports assist site development and quality control.
Most AWStats reports work from successful requests--status 200 or 304. This report contains the others. Monitor it for potential problems. The most common are:
Tip: Consider creating a custom report on the log field user agent to report on browser and operating system combinations.
We tend to think of interactive activity when we think of requests to our websites, but behind the scenes there is also a lot of automated, non-human traffic. This breaks down into four basic types:
The term robot, implying automation, refers to any of the four types. Crawler or spider refers to the undirected activity typical of search engine indexing tools: they follow links from one site to another and links within a site trawling for new content and other sites. Exploit attacks usually try to issue commands in an attempt to gain system access.
The good news is that AWStats can recognize most non-human traffic automatically and separate it from the general interactive activity reports.
Crawler traffic is highly beneficial--it is the ongoing updating of your content in search engine indexes. The ability to monitor this traffic is essential as part of an overall search engine optimization strategy. Many organizations invest in paid inclusion or keywords without first having exploited the greater benefits of organic merit-based search engine optimization (SEO). Monitor this traffic to ensure Google and other bots are updating their indexes on a regular basis. The relevant AWStats report is Robots/Spiders visitors.
Off-line downloading tools, such as Wget and
htttrack, will download content within a domain or subdirectory of a domain, as specified by the human user who launches the tool. While your server logs these requests, you do not really know if a user ever will look at all of the pages, nor how many times the user will consult the pages off-line. From a business point of view, off-line downloading could represent monitoring by your competition. The relevant AWStats report is Browsers.
Some site traffic consists of automated attempts to exploit weaknesses in web servers in an attempt to hijack the server. AWStats currently tracks five types of attacks on Microsoft IIS. If you don't use IIS, you can disable the report. The relevant AWStats report is Worm/Virus attacks.
Many sites employ automated virtual transactions to monitor specific processes in their website. The usual practice is to filter this traffic from your web statistics. To this end, AWStats provides two configuration directives. You can use SkipHosts if all of the traffic (and just that traffic) comes from a specific IP address, or SkipUserAgents if the "robot" performing the transaction identifies itself with a particular name.
One criticism leveled at web server log file data analysis is that the presence of non-human traffic distorts the statistics. The primary alternative method, page tagging, works by including page tags that should call the counting server only when a normal browser, not a robot, visits the page. In theory, this excludes non-human traffic. Page tag vendors tout this as beneficial. Unfortunately, this approach misses information essential to the management of most sites. In particular, visibility of search engine crawler activity is an essential ingredient of an overall search engine strategy. AWStats offers the best of both worlds-- it captures automated traffic and reports on it, but maintains this data separate from interactive human user reports. Web log analysis can also report on objects that you cannot readily tag, such as images and binary document files.
These articles have only touched the surface of what is possible with web analytics and AWStats. The following resources may help you integrate web log analysis with AWStats into your website management.
If you decide to use the on-demand CGI interface:
The following provide more exhaustive information on web analytics terminology and its usage.
Improper cache management, all too common, can affect both correct content delivery and web statistics.
There are two significant open source alternatives to AWStats.
None of the leading open source web analytics tools includes clickstream (path) analysis, a feature usually found in "enterprise-class" commercial solutions. StatViz, available for multiple platforms, may help fill this void. I have written rudimentary StatViz installation and configuration instructions for Linux to facilitate StatViz evaluation.
AWStats' principal author is Laurent Destailleur, email@example.com. To ensure that he maximizes his time dedicated to improving AWStats, you should use the community email addresses rather than writing him directly.
Sean Carlos is president of Antezeta, an internet consultancy focusing on Merit-Based™ search engine optimization, search engine marketing, web analytics, and web site usability.
Return to ONLamp.com.
Copyright © 2009 O'Reilly Media, Inc.