ONLamp.com
oreilly.comSafari Books Online.Conferences.

advertisement


Squid: The Definitive Guide

Eleven Metrics to Monitor for a Happy and Healthy Squid

by Duane Wessels, author of Squid: The Definitive Guide
03/25/2004

In this article, I'll show you how to stay on top of Squid's performance. If you follow this advice, you should be able to discover problems before your users begin calling you to complain.

Squid provides two interfaces for monitoring its operation: SNMP and the cache manager. Each has its own set of advantages and shortcomings.

SNMP is nice because it is familiar to many of us. If you already have some SNMP software deployed in your network, you may be able to easily add Squid to the other services that you already monitor. Squid's SNMP implementation is disabled by default at compile time. To use SNMP, you must pass the --enable-snmp option to ./configure like this: ./configure --enable-snmp ...

The downside to SNMP is that you can't use it to monitor all of the metrics that I talk about in this article. Squid's MIB has remained almost unchanged since it was first written in 1997. Some of the things that you should monitor are only available through the cache manager interface.

The cache manager is a set of "pages" that you can request from Squid with a special URL syntax. You can also use Squid's cachemgr.cgi utility to view the information through a web browser. As you'll see in the examples, it is a little awkward to use the cache manager for periodic data collection. I have a solution to this problem, which I'll describe at the end of the article.

Related Reading

Squid: The Definitive Guide
By Duane Wessels

1. Process Size

Squid's process size has a direct impact on performance. If the process becomes too large, and won't fit entirely in memory, your operating system swaps portions of it to disk. This causes performance to degrade quickly -- i.e., you'll see an increase in response times. Squid's process size can be a little bit difficult to control at times. It depends on the number of objects in your cache, the number of simultaneous users, and the types of objects that they download.

Squid has four ways to determine its process size. One or more of them may not be supported on your particular operating system. They are: getrusage(), mallinfo(), mstats(), and sbrk().

The getrusage() function reports the "Maximum Resident Set Size" (Max RSS). This is the largest amount of physical memory that the process has ever occupied. This is not always the best metric, because if the process size becomes larger than your memory's capacity, the Max RSS value does not increase. In other words, Max RSS is always less than your physical memory size, no matter how big the Squid process becomes.

The mallinfo() and mstats() functions are features of some malloc (memory allocation) libraries. They are a good indication of process size, when available. The mstats() function is unique to the GNUmalloc library.

The sbrk() function also provides a good indication of process size and seems to work on most operating systems.

Unfortunately, the only metric available as an SNMP object is the getrusage() Max RSS value. You can get it with this OID under the Squid MIB:

enterprises.nlanr.squid.cachePerf.cacheSysPerf.cacheMaxResSize

To get the other process size metrics, you'll need to use the cache manager. Request the "info" page and look for these lines:

# squidclient mgr:info | less
...
Process Data Segment Size via sbrk(): 959398 KB
Maximum Resident Size: 924516 KB
...
Total space in arena:  959392 KB

You can also use the high_memory_warning directive in squid.conf to warn you if the process size exceeds a limit that you specify. For example:

high_memory_warning 500

2. Page Fault Rate

As I mentioned in the discussion about memory usage, Squid's performance suffers when the process size exceeds your system's physical memory capacity. A good way to detect this is by monitoring the process' page-fault rate.

A page fault occurs when the program needs to access an area of memory that was swapped to disk. Page faults are blocking operations. That is, the process pauses until the memory area has been read back from disk. Until then, Squid cannot do any useful work. A low page-fault rate, say, less than one per second, may not be noticeable. However, as the rate increases, client requests take longer and longer to complete.

When using SNMP, Squid only reports the page-fault counter, rather than the rate. The counter is an ever-increasing value reported by the getrusage() function. You can calculate the rate by comparing values taken at different times. Programs such as RRDTool and MRTG do this automatically. You can get the page fault count by requesting this SNMP OID:

enterprises.nlanr.squid.cachePerf.cacheSysPerf.cacheSysPageFaults

Alternatively, you can get it from the cache manager's info page:

# squidclient mgr:info | grep 'Page faults'
   Page faults with physical i/o: 2712

You can also get the rate, calculated over five- and 60-minute intervals, by requesting other cache manager pages:

# squidclient mgr:5min | grep page_fault
page_faults = 0.146658/sec
# squidclient mgr:60min | grep page_fault
page_faults = 0.041663/sec

The high_page_fault_warning directive in squid.conf will warn you if Squid detects a high page fault rate. You specify a limit on the mean page-fault rate, measured over a one-minute interval. For example:

high_page_fault_warning 10

3. HTTP Request Rate

The HTTP request rate is a simple metric. It is the rate of requests made by clients to Squid. A quick glance at a graph of request rate versus time can help answer a number of questions. For example, if you notice that Squid suddenly seems slow, you can determine whether or not it is due to an increase in load. If the request rate seems normal, then the slowness must be due to something else.

Once you get to know what your daily load pattern looks like, you can easily identify strange events that may warrant further investigation. For example, a sudden drop in load may indicate some sort of network outage, or perhaps disgruntled users who have figured out how to bypass Squid. Similarly, a sudden increase in load might mean that one or more of your users has installed a web crawler or has been infected with a virus.

As with the page fault value, you can only get the HTTP request counter value from SNMP. Use this OID:

enterprises.nlanr.squid.cachePerf.cacheProtoStats.cacheProtoAggregateStats.cacheProtoClientHttpRequests

The cache manager reports this information in a variety of ways:

# squidclient mgr:info | grep 'Number of HTTP requests'
Number of HTTP requests received:       535805
# squidclient mgr:info | grep 'Average HTTP requests'
Average HTTP requests per minute since start:   108.4
# squidclient mgr:5min | grep 'client_http.requests'
client_http.requests = 3.002991/sec
# squidclient mgr:60min | grep 'client_http.requests'
client_http.requests = 2.636987/sec

4. ICP Request Rate

If you have neighbor caches using ICP, you'll probably want to monitor the ICP request rate as well. While there aren't any significant performance issues related to ICP queries, this will at least tell you if neighbor caches are up and running.

To get the ICP query rate via SNMP, use this OID:

enterprises.nlanr.squid.cachePerf.cacheProtoStats.cacheProtoAggregateStats.cacheIcpPktsRecv

Note that the SNMP counter includes both queries and responses that your Squid cache receives. There is no SNMP object that will give you only the queries. You can get only received queries from the cache manager, however. For example:

# squidclient mgr:counters | grep icp.queries_recv
icp.queries_recv = 8595602

Pages: 1, 2, 3

Next Pagearrow





Sponsored by: