Search Engine Crawlers
Crawler traffic is highly beneficial--it is the ongoing updating of your content in search engine indexes. The ability to monitor this traffic is essential as part of an overall search engine optimization strategy. Many organizations invest in paid inclusion or keywords without first having exploited the greater benefits of organic merit-based search engine optimization (SEO). Monitor this traffic to ensure Google and other bots are updating their indexes on a regular basis. The relevant AWStats report is Robots/Spiders visitors.
Off-line downloading tools, such as Wget and
htttrack, will download content within a domain or subdirectory of a domain, as specified by the human user who launches the tool. While your server logs these requests, you do not really know if a user ever will look at all of the pages, nor how many times the user will consult the pages off-line. From a business point of view, off-line downloading could represent monitoring by your competition. The relevant AWStats report is Browsers.
Some site traffic consists of automated attempts to exploit weaknesses in web servers in an attempt to hijack the server. AWStats currently tracks five types of attacks on Microsoft IIS. If you don't use IIS, you can disable the report. The relevant AWStats report is Worm/Virus attacks.
Many sites employ automated virtual transactions to monitor specific processes in their website. The usual practice is to filter this traffic from your web statistics. To this end, AWStats provides two configuration directives. You can use SkipHosts if all of the traffic (and just that traffic) comes from a specific IP address, or SkipUserAgents if the "robot" performing the transaction identifies itself with a particular name.
A Note on Measuring Non-Human Traffic and Page Tagging
One criticism leveled at web server log file data analysis is that the presence of non-human traffic distorts the statistics. The primary alternative method, page tagging, works by including page tags that should call the counting server only when a normal browser, not a robot, visits the page. In theory, this excludes non-human traffic. Page tag vendors tout this as beneficial. Unfortunately, this approach misses information essential to the management of most sites. In particular, visibility of search engine crawler activity is an essential ingredient of an overall search engine strategy. AWStats offers the best of both worlds-- it captures automated traffic and reports on it, but maintains this data separate from interactive human user reports. Web log analysis can also report on objects that you cannot readily tag, such as images and binary document files.
These articles have only touched the surface of what is possible with web analytics and AWStats. The following resources may help you integrate web log analysis with AWStats into your website management.
- To facilitate report interpretation by business and technical users, generate separate technical and business reports by maintaining two separate AWStats configuration files: one enabling technical reports, the other business reports.
If you decide to use the on-demand CGI interface:
- Use at least version 6.4. There were security issues in previous versions.
- Consider limiting access to the CGI interface by limiting traffic to internal IPs or by password protecting it.
- Sign up for notification of AWStats updates. New releases may include useful features and resolve bug or security issues.
- AWStats documentation online
- AWStats project page
- AWStats user community support mailing list. Before you post a question, you should search the archives to ensure it hasn't already received an answer.
- AWStats developer community, which includes patches to fix and enhance AWStats
- Commercial support may be worthwhile to jumpstart an installation; you can consult the author of this article, or a directory of companies offering paid support of open source software and indicate AWStats as the application.
- The author's AWStats web analytics resource center includes additional tips and materials for AWStats.
The following provide more exhaustive information on web analytics terminology and its usage.
- The Interactive Advertising Bureau focuses on standardizing metrics definitions used in internet advertising.
- The recently founded Web Analytics Association has developed a KPI definitions white paper (PDF)