Apache DevCenter
oreilly.comSafari Books Online.Conferences.


HTTP Wrangler

Log Rhythms

by Rael Dornfest

Logs are the pulse of your web server -- the rhythms produced by the comings and goings of your visitors. In this column I'll give you a gentle introduction to Apache web server logs and their place in monitoring, security, marketing, and feedback.

Before you go running for the hills, I won't be talking about those mathematical logarithms that gave you a headache in high school. Your web server records visits to your web site in the form of logs, a text file (or files) containing entries corresponding to each request (or "hit"). At first glance, logs may look convoluted, but they're actually quite simple. Once you're familiar with the notation, you'll be reading your logs as easily as your daily journal.

"One Hit Wonder" or Lasting Impression?

Before we dive in, let's get our terminology straight.

When the Web was young, people measured their web sites' effectiveness in terms of hits. A hit is a request made of a web server. A request may correspond to an HTML page, an image, a CGI script, or any other type of file or interactive content. The important thing to remember is that each and every request counts as a hit. Therefore, when a visitor requests a web page containing three embedded images, the log tallies up four hits.

Hit counts lost their effectiveness as people began to add gratuitous images to their pages in order to inflate their sites' perceived popularity. Hit counts are, however, useful to server administrators as a simplistic traffic or server-utilization indicator.

Page View
A page view, as you probably guessed, is a hit in which only the page (and not its embedded elements) is counted. Each visit to an HTML document, whether or not it's crammed full of animated images, sounds, and Java applets, is counted as a single page view.

Content providers track page view counts to figure out which content is most interesting to their audience. For example, say an article on Internet marketing generated 1,024 page views, whereas another on door-to-door sales generated only 42. One could reasonably guess the site's audience is far more interested in marketing than sales (at least the door-to-door kind).

As another example, let's assume my article is spread across four pages with "next page" links at the bottom of the first three. A particularly telling page view spread would be: Page 1 (456 views), Page 2 (345 views), Page 3 (93 views), and Page 4 (12 views). I would conclude that my audience, while interested in the topic overall, lost interest in my article somewhere in the second page.

Marketeers like to use page view counts as popularity indicators. But the assumption that each page view equals a unique person is almost certainly incorrect. For example, 100 page views could either signify 100 people visiting the page once, or one person visiting the page 100 times.

Unique Host vs. Unique Visitor
Your web logs, with a little massaging, can tell you the number of unique hosts (or computers) that have paid you a visit. This provides a smidge more information than straight page views, but there's a problem here, too.

A visit from a unique host doesn't necessarily equal a visit from a unique visitor. Perhaps the host in question is a computer sitting in a public library; through the course of a day, several users of that computer may visit the same site or even the same page (think Yahoo).

Then there's the issue of the dynamic host. When you dial into your Internet service provider (ISP) via modem, your computer's unique identifier (IP address) is, in all probablity, assigned dynamically. If you hang up and dial in again, there's no guarantee that you'll receive the same identifier. So, what looks like a unique host in your log file may actually be several visitors who just happen to have been allocated the same IP address at different times.

The bottom line is this: hits, page views, and host visits only give you a general picture of your web site's visitors and traffic patterns. The generally agreed upon way to properly tag and track a unique user is to use cookies (or "magic cookies"), snippets of identifying information that are sent right along with the user's request and server's response.

For more information about cookies, visit the Resources section at the end of this article.

An impression is the almost same thing as a page view -- the difference lies in what's being viewed. While a visit to a web page is generally referred to as a page view, an impression usually refers to viewing advertisements such as banners, animated mini-commercials, buttons, and the like.

Pages: 1, 2, 3

Next Pagearrow

Sponsored by: