ONLamp.com
oreilly.comSafari Books Online.Conferences.

advertisement


Eleven Metrics to Monitor for a Happy and Healthy Squid
Pages: 1, 2, 3

5. Denied Requests (HTTP and ICP)

You should normally expect to see a small number of denied requests as Squid operates. However, a high rate or percentage of denied requests indicates either 1) a mistake in your access control rules, 2) a misconfiguration on the cache client, or 3) someone attempting to attack your server.



If you use very specific address-based access controls, you'll need to carefully track IP address changes on your cache clients. For example, you may have a list of neighbor cache IP addresses. If one of those neighbors gets a new IP address, and doesn't tell you, all of its requests will be refused.

Unfortunately, there is no way to easily get a running total of denied requests from either SNMP or the cache manager. If you want to track this metric, you'll have to write a little bit of code to extract it from either the cache manager client_list page, or from Squid's access.log file.

The client_list page has counters for each client's ICP and HTTP request history. It looks like this:

Address: xxx.xxx.xxx.xxx
Name: xxx.xxx.xxx.xxx
Currently established connections: 0
   ICP Requests 776
   UDP_HIT                    9   1%
   UDP_MISS                 615  79%
   UDP_MISS_NOFETCH         152  20%
   HTTP Requests 448
   TCP_HIT                    1   0%
   TCP_MISS                 201  45%
   TCP_REFRESH_HIT            2   0%
   TCP_IMS_HIT                1   0%
   TCP_DENIED               243  54%

With a little bit of Perl, you can develop a script that prints out IP addresses of clients having more than a certain number or percentage of TCP_DENIED or UDP_DENIED requests. The primary problem with using this information is that Squid never resets the counters. Thus, the values are not sensitive to short-term variations. If Squid has been running for days or weeks, it may take a while until the denied counters exceed your threshold. To get more immediate feedback, you may want to search for UDP_DENIED and TCP_DENIED in your access.log file and count the number of such requests.

6. HTTP Service Time

The HTTP service time represents how long it usually takes to complete a single HTTP transaction. In other words, it is the amount of time elapsed between reading the client's request and writing the last chunk of the response. Response times generally have heavy-tailed distributions, so we use the median as a good indicator of the average.

In most situations, the median service time should be between 100 and 500 milliseconds. The value that you actually see may depend on the speed of your Internet connection and other factors. Of course, the value varies throughout the day, as well. You'll need to collect this metric for a while to understand what is normal for your installation. A service time that seems too high may indicate that 1) your upstream ISP is congested, or 2) your own Squid cache is overloaded or suffering from resource exhaustion (memory, file descriptors, CPU, etc.). If you suspect the latter, simply look at the other metrics described here for confirmation.

To get the five-minute median service time for all HTTP requests, use this SNMP OID:

enterprises.nlanr.squid.cachePerf.cacheProtoStats.cacheMedianSvcTable.cacheMedianSvcEntry.cacheHttpAllSvcTime.5

By browsing the MIB, you can find separate measurements for cache hits, cache misses, and 304 (Not Modified) replies. To get the median HTTP service time from the cache manager, do this:

# squidclient mgr:5min | grep client_http.all_median_svc_time
client_http.all_median_svc_time = 0.127833 seconds

You can also use the high_response_time_warning directive in squid.conf to warn you if the response time exceeds a pre-defined threshold. For example:

high_response_time_warning 700

7. DNS Service Time

The DNS service time is a similar metric, although it measures only the amount of time necessary to resolve DNS cache misses. The HTTP service time measurements actually include the DNS resolution time. However, since Squid's DNS cache usually has a high hit ratio, most HTTP requests do not require a time-consuming DNS resolution.

A high DNS service time usually indicates a problem with Squid's primary DNS server. Thus, if you see a large median DNS response time, you should look for problems on the DNS server, rather than Squid. If you cannot fix the problem, you may want to select a different primary DNS resolver for Squid, or perhaps run a dedicated resolver on the same host as Squid.

To get the five-minute median DNS service time from SNMP, request this OID:

enterprises.nlanr.squid.cachePerf.cacheProtoStats.cacheMedianSvcTable.cacheMedianSvcEntry.cacheDnsSvcTime.5

And from the cache manger:

# squidclient mgr:5min | grep dns.median_svc_time
dns.median_svc_time = 0.058152 seconds

8. Open File Descriptors

File descriptors are one of the finite resources used by Squid. If you don't know how critical file descriptor limits are to Squid's performance, read the first section of Six Things First-Time Squid Administrators Should Know and/or Chapter 3 of Squid: The Definitive Guide.

When you monitor Squid's file descriptor usage, you'll probably find that it is intricately linked to the HTTP connection rate and HTTP service time. An increase in service time or connection rate also results in an increase in file descriptor usage. Nonetheless, it is a good idea to keep track of this metric, as well. For example, if you graph file descriptor usage over time and see a plateau, your file descriptor limit is probably not high enough.

Squid's SNMP MIB doesn't have an OID for the number of currently open file descriptors. However, it can report the number of unused (closed) descriptors. You can either subtract that value from the known limit, or simply monitor the unused number. The OID is:

enterprises.nlanr.squid.cachePerf.cacheSysPerf.cacheCurrentUnusedFDescrCnt

To get the number of used (open) file descriptors from the cache manger, search for this line in the "info" page:

# squidclient mgr:info | grep 'Number of file desc currently in use'
Number of file desc currently in use:   88

9. CPU Usage

Squid's CPU usage depends on a wide variety of factors including your hardware, features that you have enabled, your cache size, HTTP and ICP query rates, and others. Furthermore, high CPU usage is not necessarily a bad thing. All other things being equal, it is better to have high CPU usage and a high request rate than low CPU usage and a low request rate. In other words, after removing a disk I/O bottleneck, you may notice that Squid's CPU usage goes up, rather than down. This is good, because it means Squid is handling more requests in the same amount of time.

There are two things to watch for in the CPU usage data. First, any periods of 100 percent CPU usage indicate some kind of problem, perhaps a software bug. Henrik Nordstrom recently uncovered an incompatibly on Linux 2.2 kernels when using poll() while half_closed_clients is enabled. This Linux kernel bug can cause periods of 100 percent CPU utilization. As a workaround you can disable the half_closed_clients directive.

The second reason to watch the CPU usage is simply to make sure that your CPU does not become a bottleneck. This might happen if you utilize CPU-intensive features such as Cache-Digests, CARP, or large regular expression-based access control lists. If you see Squid approaching 75 percent CPU utilization, you might want to consider a hardware upgrade.

Squid's SNMP MIB provides a CPU usage value with this OID:

enterprises.nlanr.squid.cachePerf.cacheSysPerf.cacheCpuUsage

Unfortunately, it is simply the ratio of CPU time to actual time since the process started. This means that it won't show short-term changes in CPU usage. To get more accurate measurements, you should use the cache manager:

# squidclient mgr:5min | grep cpu_usage
cpu_usage = 1.711396%

Pages: 1, 2, 3

Next Pagearrow





Sponsored by: