As the internet has matured, more sophisticated attention turned from hits to pages. Unfortunately, this opened a new can of worms: there is no standard definition of a page. A web server log file simply contains information on objects requested from the web server. It is up to the web server log file analysis software to give semantic meaning to those objects.
- Generally a page is a content object that a user viewed, such as an HTML file, a word processing document, or an Adobe Acrobat PDF file.
AWStats works by exclusion in defining a page. By default, any object accessed by a user on your web server is a page unless it has a filename suffix of css, js, class, gif, jpg, jpeg, png, bmp, or ico. You must explicitly add any other objects you do not want to count as pages in AWStats reports. For example, add ZIP achieves and Flash animation files to this list by adding their suffixes to the AWStats NotPageList directive in the AWStats configuration file:
NotPageList="css js class gif jpg jpeg png bmp ico swf zip tgz gz tar"
Then AWStats will count everything but the following as pages:
||Cascading Style Sheet formating instruction files|
||Java program files|
||Various image/photo formats|
||An image icon file; many sites have a company logo saved as favicon.ico; many browsers use this in bookmarks (favorites) and tabs|
||ShockWave Flash animation|
||Achieve formats created by PKZip, WinZip, tar, gzip, or similar|
One advantage to this approach is that if you are using a CGI to generate dynamic pages, you do not have to worry about each CGI query counting as a page--this will be automatic.
- Various standards boards, such as the Internet Advertising Bureau, seem to be converging on common definitions after several years of work on the topic. The primary driver is the advertising market. If you are planning to make your data public, you should consider guidelines provided by these organizations when defining your page exclusion list. If you adhere to these standards, you could add a methodology note when publishing your data so your audience will understand the basis for your numbers.
- Your servers may not see all page requests from your users. To understand why and what you can do about it, see the web caching references.
- You may have pages you do not want to track because they serve an internal purpose, are repetitive parts of a frame set, are temporary redirections, or something similar. Use the configuration parameter AWStats SkipFiles directive to list files to exclude.
While the concept of a page is open to some interpretation, the concept of a visitor (and a visit, also known as a session) is more difficult to define. Log data neither defines nor tracks a visitor entity. Several heuristic approaches can be used to extrapolate individual visitors from server log data, each approach adding an additional level of refinement.
- A visit constitutes all activity occurring without a break of more than 30 minutes. Thus, if you request a page and then wait 29 minutes before requesting a new page, both page requests take place during the same visit (or session). However, if you request the subsequent page 30 minutes and 1 second later, that is a new visit. AWStats currently considers a visitor session break to be 60 minutes. Hopefully, this will be configurable in a future version.
- Synonym for visit.
- Unique visitors
- The count of visitors after removing duplicate visits.
- Authenticated visitors
- Users who have logged in with a username and password. This can be a web server-controlled login or an application server-level login. Web log analysis tools like AWStats track logins at the web server level. The application level login is more common.