oreilly.comSafari Books Online.Conferences.


System Performance Tuning, 2nd Edition

Application I/O Pattern Analysis
A Case Study in Performance Analysis

by Gian-Paolo D. Musumeci, co-author of System Performance Tuning, 2nd Edition

The 307 Corporation is involved in the industrial production of very high-grade ethanol by means of a novel mechanism. Their operating apparatus consists of about a hundred large production vessels. Because of the nature of their process it is necessary to monitor several parameters -- temperature, for example -- at various points throughout the vessel. This data is then used to optimize the ethanol production process. They are considering scaling up their production process by a factor of four, and want to be sure they will be able to record data fast enough.

In general, one of the top areas to look for performance improvements is disk I/O. "Spinning rust," as one of my friends refers to it, is significantly slower than main memory. And unfortunately, in exchange for capacity and relative permanence, our applications tend to use it. Before we can start tuning our disk subsystems for maximum performance, however, we need to be able to look at how our applications exercise the disk.

As far as our work for 307 Corp. goes, we must first figure out what sort of patterns their data collection mechanism is generating on the disk: only then can we approach the problem of how to configure their I/O subsystem.

They use a Sun Ultra Enterprise 3500 with six 400 MHz processors and six gigabytes of memory running Solaris 8 to record all data concerning the operating parameters of these production vessels, which comes in via very fast serial interconnects. At present, all of their data is being recorded to a single 7200rpm FC-AL disk.

In order to figure out what's going on, we'll use the TNF, or "trace normal form," kernel tracing facility in Solaris, along with some Perl to extract the relevant fields from the (voluminous) TNF output. Then we'll visualize the data to see what sort of patterns present themselves.

We can set up our data collection by using prex:

    # prex -k
    Type "help" for help ...
    prex> buffer alloc 2m
    Buffer of size 2097152 bytes allocated
    prex> enable io
    prex> trace io
    prex> ktrace on

(For a more detailed discussion of prex, consult the man pages, or look in System Performance Tuning, 2nd Edition, pages 179-183.)

We now wait for a little while (a few seconds) to collect some data, and then stop the data collection process and extract the records:

    prex> ktrace off
    prex> quit
    # cd /tmp
    # tnfxtract trace.rawtnf
    # tnfdump trace.tnf

Now we've got a shiny new I/O subsystem trace file. The problem of parsing it, however, still lies before us. To separate the wheat of understanding from the chaff of despair, we will turn to every system engineer's best friend, Perl. A little bit of Perl will let us pull out relevant fields:

    open (FD, $ARGV[0]) or die "Can't open $ARGV[0]: $!";
    $i = 0;
    while (<FD>) {
        if ($_ =~ / physio_start /) {
            my (undef, $time, undef, $pid, $lwp,
                undef, undef, undef, undef, undef, undef,
                $offset, undef, $size, undef) = split(/\s+/);
            print "$i $time $pid $lwp $offset $size\n";

Now that we have all our data and we can get at it in a reasonable way, the first question we should ask is, "How much data are we writing to disk per I/O operation?" Computing the mean I/O size is pretty straightforward:

    % iotx trace.tnf | cut -f6 -d' ' | mean
    mean of 1164 samples is 512

(Here "mean" is a little piece of Perl that computes the mean of numbers given to it on standard input.)

However, we should also check that the mean is a good way of representing the average I/O size. For example, here we have an observed mean size of 512 bytes. The 'average' might actually be 512 bytes, or we might have a pair of peaks: perhaps one at 602 bytes and one at 422 bytes (such a distribution would be called 'bimodal'). So to check ourselves, we should take a look at the data, and make sure it looks like what we think it does. In this case, the only writes being done to the disk are 512 bytes -- but it was good to check.

Two more interesting questions to ask are: How many I/O operations are being issued per second? and, Where on the disk are these I/Os are located? Answering the first question is pretty straightforward: our TNF data shows that 1,163 I/Os were completed, and the trace has a lifetime of 17,530 milliseconds. A millisecond is a thousandth of a second, so we are doing about 66 transactions per second.

Answering the second question is a bit more difficult, but after some thought, drawing a scatter plot of time (on the horizontal access) and disk seek position (on the vertical axis) looks promising.

I/O time vs. location

That looks pretty random to me.

After all this data collection and analysis, what have we learned? We can summarize our improved understanding in one sentence:

307 Corporation's production vessel monitoring system produces randomly distributed disk writes of, on average, 512 bytes, and requires approximately 66 I/Os per second in the present environment.

It seems like a lot of work for one sentence, but this sentence is absolutely vital to understanding the environment we're working in. That understanding is what is going to enable us to tune the disk subsystem in order to produce higher performance, which we'll cover in next month's column.

Gian-Paolo D. Musumeci is a research engineer in the Performance and Availability Engineering group at Sun Microsystems, where he focuses on network performance.

Return to

Sponsored by: