oreilly.comSafari Books Online.Conferences.


Analyzing Web Logs with AWStats
Pages: 1, 2, 3, 4, 5

Some significant problems are inherent in tracking visitors and their visits with web log analysis software such as AWStats.

  • You cannot tell that two people connecting at different points in the day from a shared home PC (or internet cafe) are two unique visitors, not one.
  • You cannot know that somebody who connects from both home and work is one unique visitor, not two.
  • Some ISPs (internet service providers) assign a new IP to each request, so if you view three pages over the span of a few minutes, you will appear as three distinct visitors.
  • An ISP may reassign an IP to several users over the course of a day. Assume that Giacomo connects to the internet using his dial-up modem connection at 7:35 a.m. After a few minutes, he disconnects. His host IP address,, is now free. At 8:10 a.m., Patrizia connects with her modem and is assigned the host IP address by her provider. If she visits a site, is she the same visitor in the same visit (session) as before?

    The commonly accepted convention is that a visit has ended if there is no further activity from the visitor after 30 minutes. Thus, her visit would be a new session or visit--but you have no way of knowing that she is a different person from Giacomo. When Giacomo connects later in the day, he will most likely do so from the office, so even if he had a fixed IP at home, he will have a new host IP from the office and will thus appear as a different visitor than the Giacomo who visited at 7:35 a.m.

  • Users in large companies often access the internet through a "proxy"--in effect, aggregating thousands of users into one.

Despite these limitations in heuristic approaches, the concept of visitors and sessions (each individual visit) remains a valid tool as an indication of overall user behavior and trends.

Table 4. Visits and unique visitors
Visitor No. Visits (sessions) Unique visits
1 2 1
2 1 1
3 12 1
3 15 3

Bandwidth consumption

Bandwidth consumption is of interest to technical staff, as there is usually an economic cost associated with its use. On a more granular level, large individual file sizes will indicate performance issues, especially for dial-up users.

The total file size sent from the web server to the end user. This does not include HTTP headers in served objects, HTTP request headers from users, nor bytes needed by the underlying network protocols.

The final part of this series will look at the reports we generated, using the definitions above to identify business and technical metrics to watch.

Sean Carlos is president of Antezeta, an internet consultancy focusing on Merit-Based™ search engine optimization, search engine marketing, web analytics, and web site usability.

Return to

Sponsored by: