ONLamp.com
oreilly.comSafari Books Online.Conferences.

advertisement


System Performance Tuning, 2nd Edition

A Journey to the Dimension of Pain: Paging Space
A Case Study in Performance Analysis

by Gian-Paolo D. Musumeci, coauthor of System Performance Tuning, 2nd Edition
05/30/2002

In the grain transportation market, one name leaps to mind: Paz, S.p.A.. Headquartered in Geneva, this company provides high-efficiency designs for grain transport, as well as assessing the environmental impact of deliveries. Paz has a critical need to provide computational analysis to support its products.

One key application, rtsim, is used to simulate airflow over a high-velocity grain transporter. Most of the rtsim execution occurs on a single Sun Ultra Enterprise 3500 server with six 400 MHz processors, eight internal disks, and 1 GB of memory. It runs the latest version of Solaris 8.

Unfortunately, the scientific staff responsible for interpreting rtsim output are extremely unhappy -- rtsim is performing very poorly (it takes quite a few hours to complete).

One kind scientist explains that rtsim has two basic parts. First, it initializes a very large two-dimensional array: this array represents the surface of the grain transporter. Second, it performs a series of transformations on array elements, in order to find the optimal minimization of the airflow over the surface. The scientist assures us that the code has been very seriously inspected for performance, and they all believe this is likely to be a system problem.

With just the complaint of "rtsim is slow," it's hard to come up with any good theories about what might be wrong. We might be seeing a disk problem when rtsim is writing its output, or a memory constraint problem, or perhaps exhaustion of our CPUs, or something more exotic. So, we'll start by gathering the output of three commands:

  • - iostat -zxnPC 10, to report disk activity
  • - mpstat 10, to report per-processor statistics
  • - vmstat 10, to report virtual memory system statistics

The change in patterns as the application moves from the first phase (initialization) to the second stage (transformation) is very noticeable. Let's start by looking at the mpstat output:

rtsim-svr# mpstat 10
...
mpstat 10
CPU minf mjf  xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt idl
10     3   0   508   400  300   10    0    0   25    0     3    0   3  97   0
11  1077   0     0   103  100   19    3    0    2    0     0    5  20  76   0
14     0   0    35   100  100    9    0    0   20    0     0    0  13  87   0
15     0   0  1099   169  169   48    0    0   25    0     0    0   1  99   0
18     2   0  8674   100  100   23    0    0   77    0     1    0   3  62  35
19    46   0 15482   101  101   23    0    0  221    0     0    0  10   0  90
...

This looks a little bit grim -- the system processors are spending a lot of time in the wt state, which means they are spending most of their time waiting for I/O activity. Let's see what the disks are doing by checking out the iostat output:

rtsim-svr# iostat -zxnPC 10
...
                    extended device statistics              
r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
0.8    0.0    6.4    0.0  0.0  0.0    0.0    9.7   0   1 c0t0d0s1
0.0   70.9    0.0 9079.5 3245.4 15.0 45752.7  211.5 100 100 c0t1d0s0
...

This is starting to look stunningly bad. The mean write service time is almost 46 seconds: c0t1d0s0 is horribly oversubscribed. The first order of business is probably to figure out what c0t1d0s0 is used for.

rtsim-svr# grep c0t1d0s0 /etc/vfstab
/dev/dsk/c0t1d0s0       -       -       swap    -       no      -

Uh oh -- c0t1d0s0 is the paging device. Let's check out what vmstat has to say:

rtsim-svr# vmstat 10
procs     memory            page            disk          faults      cpu
r b w   swap  free  re  mf pi po fr de sr dd s2 sd sd   in   sy   cs us sy id
...
0 0 28 5440336 282384 0 8368 0 521 552 0 18398 0 0 0 0 817   52   89  6  9 84
0 0 30 5460120 23536 0 1119 5 8554 8772 480 7849 0 0 0 0 834  7  215  1  9 90
0 2 44 5460840 23648 1 1138 10 9082 9116 16456 58661 0 0 0 0 876 7 152 1 7 92
...

In general, with Solaris 8, the key column to look at to detect a memory shortage is the sr column. This reports the rate at which the page daemon is scanning through memory looking for pages that can be stolen to satisfy more immediate needs. Any sustained non-zero value here is indicative of a need for more memory -- that is clearly the case here.

Pages: 1, 2

Next Pagearrow





Sponsored by: