ONLamp.com    
 Published on ONLamp.com (http://www.onlamp.com/)
 See this if you're having trouble printing code examples


Squid: The Definitive Guide

Six Things First-Time Squid Administrators Should Know

by Duane Wessels, author of Squid: The Definitive Guide
02/12/2004

New users often struggle with the same frustrating set of Squid idiosyncracies. In this article, I'll detail six things you should know about using Squid from the get-go. Even if you're an experienced Squid administrator, you might want to look at these tips and give your configuration file a sanity check, especially the one about preventing spam.

1. File Descriptor Limits

File descriptor limits are a common problem for new Squid users. This happens because some operating systems have relatively low per-process and system-wide limits. In some cases, you must take steps to tune your system before compiling Squid.

A file descriptor is simply a number that represents an open file or socket. Every time a process opens a new file or socket, it allocates a new file descriptor. These descriptors are reused after the file or socket is closed. Most Unix systems place a limit on the number of simultaneously open file descriptors. There are both per-process and per-system limits.

How many file descriptors does Squid need? The answer depends on how many users you have, the size of your cache, and which particular features that you have enabled. Here are some of the things that consume file descriptors in Squid:

Related Reading

Squid: The Definitive Guide
By Duane Wessels

Even when Squid is not doing anything, it has some number of file descriptors open for log files and helpers. In most cases, this is between 10 and 25, so it's probably not a big deal. If you have a lot of external helpers, that number goes up. However, the file descriptor count really goes up once Squid starts serving requests. In the worst case, each concurrent request requires three file descriptors: the client-side connection, a server-side connection for cache misses, and a disk file for reading hits or writing misses.

A Squid cache with just a few users might be able to get by with a file descriptor limit of 256. For a moderately busy Squid, 1024 is a better limit. Very busy caches should use 4096 or more. One thing to keep in mind is that file descriptor usage often surges above the normal level for brief amounts of time. This can happen during short, temporary network outages or other interruptions in service.

There are a number of ways to determine the file descriptor limit on your system. One is to use the built-in shell commands limit or ulimit.

For Bourne shell users:

root# ulimit -n
1024

For C shell users:

root# limit desc
descriptors     1024

If you already have Squid compiled and installed, you can just look at the cache.log file for a line like this:

2003/12/12 11:10:54| With 1024 file descriptors available

If Squid detects a file descriptor shortage while it is running, you'll see a warning like this in cache.log:

WARNING! Your cache is running out of file descriptors

If you see the warning, or know in advance that you'll need more file descriptors, you should increase the limits. The technique for increasing the file descriptor limit varies between operating systems.

For Linux Users

Linux users need to edit one of the system include files and twiddle one of the system parameters via the /proc interface. First, edit /usr/include/bits/types.h and change the value for __FD_SETSIZE. Then, give the kernel a new limit with this command:

root# echo 1024 > /proc/sys/fs/file-max

Finally, before compiling or running Squid, execute this shell command to set the process limit equal to the kernel limit:

root# ulimit -Hn 1024

After you have set the limit in this manner, you'll need to reconfigure, recompile, and reinstall Squid. Also note that these two commands do not permanently set the limit. They must be executed each time your system boots. You'll want to add them to your system startup scripts.

For NetBSD/OpenBSD/FreeBSD Users

On BSD-based systems, you'll need to compile a new kernel. The kernel configuration file lives in a directory such as /usr/src/sys/i386/conf or /usr/src/sys/arch/i386/conf. There you'll find a file, possibly named GENERIC, to which you should add a line like this:

options       MAXFILES=8192

For OpenBSD, use option instead of options. Reboot your system after you've finished configuring, compiling, and installing your new kernel. Then, reconfigure, recompile, and reinstall Squid.

For Solaris Users

Add this line to your /etc/system file:

set rlim_fd_max = 1024

Then, reboot the system, reconfigure, recompile, and reinstall Squid.

For further information on file descriptor limits, see Chapter 3, "Compiling and Installing", of Squid: The Definitive Guide or section 11.4 of the Squid FAQ.

2. File and Directory Permissions

Directory permissions are another problem that first-time users often encounter. One of the reasons for this difficulty is that, in the interest of security, Squid refuses to run as root. Furthermore, if you do start Squid as root, it switches to a default user ("nobody") that has no special privileges. If you don't want to use the "nobody" userid, you can set your own with the cache_effective_user directive in the configuration file.

Certain files and directories must be writable by the Squid userid. These include the log files, usually found in /usr/local/squid/var/logs, and the cache directories, /usr/local/squid/var/cache by default.

As an example, let's assume that you're using the "nobody" userid for Squid. After running make install, you can use this command to set the permissions for the log files and cache:

root# chown -R nobody /usr/local/squid/var/logs
root# chown -R nobody /usr/local/squid/var/cache

Then, you can proceed to initialize the cache directories with this command:

root# /usr/local/squid/sbin/squid -z

Helper processes are another source of potential permission problems. Squid spawns the helper processes as the unprivileged user (that is, as "nobody"). This usually means that the helper program must have read and execute permissions for everyone (for example, -rwxr-xr-x). Furthermore, any configuration or password files that the helper needs must have appropriate read permissions as well.

Note that Unix also requires correct permissions on parent directories leading to a file. For example, if /usr/local/squid is owned by root with -rwxr-x--- permissions, the user nobody will not be able to access any of the directories underneath it. /usr/local/squid should be "-rwxr-xr-x" instead.

You may want to debug file or directory permission problems from a shell window. If Squid runs as nobody, then start a shell process as user nobody:

root# su - nobody

(You may have to temporarily change "nobody"'s home directory and shell program for this to work.) Then, try to read, write, or execute the files that are giving you trouble. For example:

nobody$ cd /usr
nobody$ cd local
nobody$ cd squid
nobody$ cd var
nobody$ cd logs
nobody$ touch cache.log

3. Controlling Squid's Memory Usage

Squid tends to be a bit of a memory hog. It uses memory for many different things, some of which are easier to control than others. Memory usage is important because if the Squid process size exceeds your system's RAM capacity, some chunks of the process must be temporarily swapped to disk. Swapping can also happen if you have other memory-hungry applications running on the same system. Swapping causes Squid's performance to degrade very quickly.

An easy way to monitor Squid's memory usage is with standard system tools such as top and ps. You can also ask Squid itself how much memory it is using, through either the cache manager or SNMP interfaces. If the process size becomes too large, you'll want to take steps to reduce it. A good rule of thumb is to not let Squid's process size exceed 60% to 80% of your RAM capacity.

One of the most important uses for memory is the main cache index. This is a hash table that contains a small amount of metadata for each object in the cache. Unfortunately, all of these "small" data structures add up to a lot when Squid contains millions of objects. The only way to control the size of the in-memory index is to change Squid's disk cache size (with the cache_dir directive). Thus, if you have plenty of disk space, but are short on RAM, you may have to leave the disk space underutilized.

Squid's in-memory cache can also use significant amounts of RAM. This is where Squid stores incoming and recently retrieved objects. Its size is controlled by setting the cache_mem directive. Note that the cache_mem directive only affects the size of the memory cache, not Squid's entire memory footprint.

Squid also uses some memory for various I/O buffers. For example, each time a client makes an HTTP request to Squid, a number of memory buffers are allocated and then later freed. Squid uses similar buffers when forwarding requests to origin servers, and when reading and writing disk files. Depending on the amount and type of traffic coming to Squid, these I/O buffers may require a lot of memory. There's not much you can do to control memory usage for these purposes. However, you can try changing the TCP receive buffer size with the tcp_recv_bufsize directive.

If you have a large number of clients accessing Squid, you may find that the "client DB" consumes more memory than you would like. It keeps a small number of counters for each client IP address that sends requests to Squid. You can reduce Squid's memory usage a little by disabling this feature. Simply put client_db off in squid.conf.

Another thing that can help is to simply restart Squid periodically, say, once per week. Over time, something may happen (such as a network outage) that causes Squid to temporarily allocate a large amount of memory. Even though Squid may not be using that memory, it may still be attached to the Squid process. Restarting Squid allows your operating system to truly free up the memory for other uses.

You can use Squid's high_memory_warning directive to warn you when its memory size exceeds a certain limit. For example, add a line like this to squid.conf:

high_memory_warning 400 MB

Then, if the process grows beyond that value, Squid writes warnings to cache.log and syslog if configured.

4. Rotating the Log Files

Squid writes to various log and journal files as it runs. These files will continually increase in size unless you take steps to "rotate" them. Rotation refers to the process of closing a log file, renaming it, and opening a new log file. It's similar to the way that most systems deal with their syslog files, such as /var/log/messages.

If you don't rotate the log files, they may eventually consume all free space on that partition. Some operating systems, such as Linux, cannot support files larger than 2Gb. When this happens, you'll get a "File too large" error message and Squid will complain and restart.

To avoid such problems, create a cron job that periodically rotates the log files. It can be as simple as this:

0 0 * * * /usr/local/squid/sbin/squid -k rotate

In most cases, daily log file rotation is the most appropriate. A not-so-busy cache can get by with weekly or monthly rotation.

Squid appends numeric suffixes to rotated log files. Each time you run squid -k rotate, each file's numeric suffix is incremented by one. Thus, cache.log.0 becomes cache.log.1, cache.log.1 becomes cache.log.2, and so on. The logfile_rotate directive specifies the maximum number of old files to keep around.

Logfile rotation affects more than just the log files in /usr/local/squid/var/logs. It also generates new swap.state files for each cache directory. However, Squid does not keep old copies of the swap.state files. It simply writes a new file from the in-memory index and forgets about the old one.

5. Understanding Squid's Access Control Syntax

Squid has an extensive, but somewhat confusing, set of access controls. The most important thing to understand is the difference between ACL types, elements, and rules, and how they work together to allow or deny access.

Squid has about 20 different ACL types. These refer to certain aspects of an HTTP request or response, such as the client's IP address (the src type), the origin server's hostname (the dstdomain type), and the HTTP request method (the method type).

An ACL element consists of three components: a type, a name, and one or more type-specific values. Here are some simple examples:

acl Foo src 1.2.3.4
acl Bar dstdomain www.cnn.com
acl Baz method GET

The above ACL element named Foo would match a request that comes from the IP address 1.2.3.4. The ACL named Bar matches a www.cnn.com URL. The Baz ACL matches an HTTP GET request. Note that we are not allowing or denying anything yet.

For most of the ACL types, an element can have multiple values, like this:

acl Argle src 1.1.1.8 1.1.1.28 1.1.1.88
acl Bargle dstdomain www.nbc.com www.abc.com www.cbs.com
acl Fraggle method PUT POST

A multi-valued ACL matches a request when any one of the values is a match. They use OR logic. The Argle ACL matches a request from 1.1.1.8, from 1.1.1.28, or from 1.1.1.88. The Bargle ACL matches requests to NBC, ABC, or CBS web sites. The Fraggle ACL matches a request with the methods PUT or POST.

Now that you're an expert in ACL elements, its time to graduate to ACL rules. These are where you say that a request is allowed or denied. Access list rules refer to ACL elements by their names and contain either the allow or deny keyword. Here are some simple examples:

http_access allow Foo
http_access deny Bar
http_access allow Baz

It is important to understand that access list rules are checked in order and that the decision is made when a match is found. Given the above list, let's see what happens when a user from 1.2.3.4 makes a GET request for www.cnn.com. Squid encounters the allow Foo rule first. Our request matches the Foo ACL, because the source address is 1.2.3.4, and the request is allowed to proceed. The remaining rules are not checked.

How about a PUT request for www.cnn.com from 5.5.5.5? The request does not match the first rule. It does match the second rule, however. This access list rule says that the request must be denied, so the user receives an error message from Squid.

How about a GET request for www.oreilly.com from 5.5.5.5? The request does not match the first rule (allow Foo). It does not match the second rule, either, because www.oreilly.com is different than www.cnn.com. However, it does match the third rule, because the request method is GET.

Of course, these simple ACL rules are not very interesting. The real power comes from Squid's ability to combine multiple elements on a single rule. When a rule contains multiple elements, each element must be a match in order to trigger the rule. In other words, Squid uses AND logic for access list rules. Consider this example:

http_access allow Foo Bar
http_access deny Foo

The first rule says that a request from 1.2.3.4 AND for www.cnn.com will be allowed. However, the second rule says that any other request from 1.2.3.4 will be denied. These two lines restrict the user at 1.2.3.4 to visiting only the www.cnn.com site. Here's an even more complex example:

http_access deny Argle Bargle Fraggle
http_access allow Argle Bargle
http_access deny Argle

These three lines allow the Argle clients (1.1.1.8, 1.1.1.28, and 1.1.1.88) to access the Bargle servers (www.nbc.com, www.abc.com, and www.cbs.com), but not with PUT or POST methods. Furthermore, the Argle clients are not allowed to access any other servers.

One of the common mistakes often made by new users is to write a rule that can never be true. It is easy to do if you forget that Squid uses AND logic on rules and OR logic on elements. Here is a configuration that can never be true:

acl A 1.1.1.1
acl B 2.2.2.2
http_access allow A B

The reason is that a request cannot be from both 1.1.1.1 AND 2.2.2.2 at the same time. Most likely, it should be written like this:

acl A 1.1.1.1 2.2.2.2
http_access allow A

Then, requests from either 1.1.1.1 or 2.2.2.2 are allowed.

Access control rules can become long and complicated. When adding a new rule, how do you know where it should go? You should put more-specific rules before less-specific ones. Remember that the rules are checked in order. When adding a rule, go through the current rules in your head and see where the new one fits. For example, let's say that you want to deny requests to a certain site, but allow all others. It should look like this:

acl XXX www.badsite.net
acl All src 0/0
http_access deny XXX
http_access allow All

Now, what if you need to make an exception for one user, so that she can visit that site? The new ACL element is:

acl Admin 3.3.3.3

and the new rule should be:

http_access allow Admin XXX

but where does it go? Since this rule is more specific than the deny XXX rule, it should go first:

http_access allow Admin XXX
http_access deny XXX
http_access allow All

If we place the new rule after deny XXX, it will never even get checked. The first rule will always match the request and she will not be able to visit the site.

When you first install Squid, the access control rules will deny every request. To get things working, you'll need to add an ACL element and a rule for your local network. The easiest way is to write an source IP address ACL element for your subnet(s). For example:

acl MyNetwork src 192.168.0.0/24

Then, search through squid.conf for this line:

# INSERT YOUR OWN RULE(S) HERE TO ALLOW ACCESS FROM YOUR CLIENTS

After that line, add an http_access line with an allow rule:

http_access allow MyNetwork

Once you get this simple configuration working, feel free to move on to some of the more advanced ACL features, such as username-based proxy authentication.

6. How to Not Be a Spam Relay

Unless you've been living under a rock, you're aware of the spam problem on the Internet. Spam senders used to take advantage of open email relays. These days, a lot of spam comes from open proxies. An open proxy is one that allows outsiders to make requests through it. If others on the Internet receive spam email from your proxy, your IP address will be placed on one or more of the various blackhole lists. This will adversely affect your ability to communicate with other Internet sites.

Use the following access control rules to make sure this never happens to you. First, always deny all requests that don't come from your local network. Define an ACL element for your subnet:

acl MyNetwork src 10.0.0.0/16

Then, place a deny rule near the top of your http_access rules that matches requests from anywhere else:

http_access deny !MyNetwork
http_access ...
http_access ...

While that may stop outsiders, it may not be good enough. It won't stop insiders who intentionally, or unintentionally, try to forward spam through Squid. To add even more security, you should make sure that Squid never connects to another server's SMTP port:

acl SMTP_port port 25
http_access deny SMTP_port

In fact, there are many well-known TCP ports, in addition to SMTP, to which Squid should never connect. The default squid.conf includes some rules to address this. There, you'll see a Safe_ports ACL element that defines good ports. A deny !Safe_ports rule ensures that Squid does not connect to any of the bad ports, including SMTP.

Duane Wessels discovered Unix and the Internet as an undergraduate student studying physics at Washington State University.


O'Reilly & Associates published Squid: The Definitive Guide in January 2004.


Return to ONLamp.com.

Copyright © 2009 O'Reilly Media, Inc.