Six Things First-Time Squid Administrators Should Knowby Duane Wessels, author of Squid: The Definitive Guide
New users often struggle with the same frustrating set of Squid idiosyncracies. In this article, I'll detail six things you should know about using Squid from the get-go. Even if you're an experienced Squid administrator, you might want to look at these tips and give your configuration file a sanity check, especially the one about preventing spam.
File descriptor limits are a common problem for new Squid users. This happens because some operating systems have relatively low per-process and system-wide limits. In some cases, you must take steps to tune your system before compiling Squid.
A file descriptor is simply a number that represents an open file or socket. Every time a process opens a new file or socket, it allocates a new file descriptor. These descriptors are reused after the file or socket is closed. Most Unix systems place a limit on the number of simultaneously open file descriptors. There are both per-process and per-system limits.
How many file descriptors does Squid need? The answer depends on how many users you have, the size of your cache, and which particular features that you have enabled. Here are some of the things that consume file descriptors in Squid:
Even when Squid is not doing anything, it has some number of file descriptors open for log files and helpers. In most cases, this is between 10 and 25, so it's probably not a big deal. If you have a lot of external helpers, that number goes up. However, the file descriptor count really goes up once Squid starts serving requests. In the worst case, each concurrent request requires three file descriptors: the client-side connection, a server-side connection for cache misses, and a disk file for reading hits or writing misses.
A Squid cache with just a few users might be able to get by with a file descriptor limit of 256. For a moderately busy Squid, 1024 is a better limit. Very busy caches should use 4096 or more. One thing to keep in mind is that file descriptor usage often surges above the normal level for brief amounts of time. This can happen during short, temporary network outages or other interruptions in service.
There are a number of ways to determine the file descriptor limit
on your system. One is to use the built-in shell commands
For Bourne shell users:
root# ulimit -n 1024
For C shell users:
root# limit desc descriptors 1024
If you already have Squid compiled and installed, you can just look at the cache.log file for a line like this:
2003/12/12 11:10:54| With 1024 file descriptors available
If Squid detects a file descriptor shortage while it is running, you'll see a warning like this in cache.log:
WARNING! Your cache is running out of file descriptors
If you see the warning, or know in advance that you'll need more file descriptors, you should increase the limits. The technique for increasing the file descriptor limit varies between operating systems.
Linux users need to edit one of the system include files and
twiddle one of the system parameters via the
First, edit /usr/include/bits/types.h and change the value for
__FD_SETSIZE. Then, give the kernel a new limit with this command:
root# echo 1024 > /proc/sys/fs/file-max
Finally, before compiling or running Squid, execute this shell command to set the process limit equal to the kernel limit:
root# ulimit -Hn 1024
After you have set the limit in this manner, you'll need to reconfigure, recompile, and reinstall Squid. Also note that these two commands do not permanently set the limit. They must be executed each time your system boots. You'll want to add them to your system startup scripts.
On BSD-based systems, you'll need to compile a new kernel. The kernel configuration file lives in a directory such as /usr/src/sys/i386/conf or /usr/src/sys/arch/i386/conf. There you'll find a file, possibly named GENERIC, to which you should add a line like this:
For OpenBSD, use
option instead of
options. Reboot your
system after you've finished configuring, compiling, and installing
your new kernel. Then, reconfigure, recompile, and reinstall
Add this line to your /etc/system file:
set rlim_fd_max = 1024
Then, reboot the system, reconfigure, recompile, and reinstall Squid.
For further information on file descriptor limits, see Chapter 3, "Compiling and Installing", of Squid: The Definitive Guide or section 11.4 of the Squid FAQ.
Directory permissions are another problem that first-time users
often encounter. One of the reasons for this difficulty is that,
in the interest of security, Squid refuses to run as root.
Furthermore, if you do start Squid as root, it switches to a
default user ("nobody") that has no special privileges. If you
don't want to use the "nobody" userid, you can set your own with
cache_effective_user directive in the configuration file.
Certain files and directories must be writable by the Squid userid. These include the log files, usually found in /usr/local/squid/var/logs, and the cache directories, /usr/local/squid/var/cache by default.
As an example, let's assume that you're using the "nobody" userid
for Squid. After running
make install, you can use this command
to set the permissions for the log files and cache:
root# chown -R nobody /usr/local/squid/var/logs root# chown -R nobody /usr/local/squid/var/cache
Then, you can proceed to initialize the cache directories with this command:
root# /usr/local/squid/sbin/squid -z
Helper processes are another source of potential permission
problems. Squid spawns the helper processes as the unprivileged
user (that is, as "nobody"). This usually means that the helper program
must have read and execute permissions for everyone (for example,
-rwxr-xr-x). Furthermore, any configuration or password files
that the helper needs must have appropriate read permissions as
Note that Unix also requires correct permissions on parent
directories leading to a file. For example, if /usr/local/squid
is owned by root with
-rwxr-x--- permissions, the user nobody
will not be able to access any of the directories underneath it.
/usr/local/squid should be "
You may want to debug file or directory permission problems from a shell window. If Squid runs as nobody, then start a shell process as user nobody:
root# su - nobody
(You may have to temporarily change "nobody"'s home directory and shell program for this to work.) Then, try to read, write, or execute the files that are giving you trouble. For example:
nobody$ cd /usr nobody$ cd local nobody$ cd squid nobody$ cd var nobody$ cd logs nobody$ touch cache.log
Squid tends to be a bit of a memory hog. It uses memory for many different things, some of which are easier to control than others. Memory usage is important because if the Squid process size exceeds your system's RAM capacity, some chunks of the process must be temporarily swapped to disk. Swapping can also happen if you have other memory-hungry applications running on the same system. Swapping causes Squid's performance to degrade very quickly.
An easy way to monitor Squid's memory usage is with standard
system tools such as
ps. You can also ask Squid
itself how much memory it is using, through either the cache
manager or SNMP interfaces. If the process size becomes too large,
you'll want to take steps to reduce it. A good rule of thumb is
to not let Squid's process size exceed 60% to 80% of your RAM capacity.
One of the most important uses for memory is the main cache index.
This is a hash table that contains a small amount of metadata for
each object in the cache. Unfortunately, all of these "small" data
structures add up to a lot when Squid contains millions of objects.
The only way to control the size of the in-memory index is to
change Squid's disk cache size (with the
Thus, if you have plenty of disk space, but are short on RAM, you
may have to leave the disk space underutilized.
Squid's in-memory cache can also use significant amounts of RAM.
This is where Squid stores incoming and recently retrieved objects.
Its size is controlled by setting the
cache_mem directive. Note
cache_mem directive only affects the size of the memory
cache, not Squid's entire memory footprint.
Squid also uses some memory for various I/O buffers. For example,
each time a client makes an HTTP request to Squid, a number of
memory buffers are allocated and then later freed. Squid uses
similar buffers when forwarding requests to origin servers, and
when reading and writing disk files. Depending on the amount and
type of traffic coming to Squid, these I/O buffers may require a
lot of memory. There's not much you can do to control memory
usage for these purposes. However, you can try changing the TCP
receive buffer size with the
If you have a large number of clients accessing Squid, you may
find that the "client DB" consumes more memory than you would
like. It keeps a small number of counters for each client IP
address that sends requests to Squid. You can reduce Squid's
memory usage a little by disabling this feature. Simply put
client_db off in squid.conf.
Another thing that can help is to simply restart Squid periodically, say, once per week. Over time, something may happen (such as a network outage) that causes Squid to temporarily allocate a large amount of memory. Even though Squid may not be using that memory, it may still be attached to the Squid process. Restarting Squid allows your operating system to truly free up the memory for other uses.
You can use Squid's
high_memory_warning directive to warn you
when its memory size exceeds a certain limit. For example, add
a line like this to squid.conf:
high_memory_warning 400 MB
Then, if the process grows beyond that value, Squid writes warnings to cache.log and syslog if configured.
Squid writes to various log and journal files as it runs. These files will continually increase in size unless you take steps to "rotate" them. Rotation refers to the process of closing a log file, renaming it, and opening a new log file. It's similar to the way that most systems deal with their syslog files, such as /var/log/messages.
If you don't rotate the log files, they may eventually consume all free space on that partition. Some operating systems, such as Linux, cannot support files larger than 2Gb. When this happens, you'll get a "File too large" error message and Squid will complain and restart.
To avoid such problems, create a
cron job that periodically rotates
the log files. It can be as simple as this:
0 0 * * * /usr/local/squid/sbin/squid -k rotate
In most cases, daily log file rotation is the most appropriate. A not-so-busy cache can get by with weekly or monthly rotation.
Squid appends numeric suffixes to rotated log files. Each time
squid -k rotate, each file's numeric suffix is incremented
by one. Thus, cache.log.0 becomes cache.log.1, cache.log.1 becomes
cache.log.2, and so on. The
logfile_rotate directive specifies
the maximum number of old files to keep around.
Logfile rotation affects more than just the log files in /usr/local/squid/var/logs. It also generates new swap.state files for each cache directory. However, Squid does not keep old copies of the swap.state files. It simply writes a new file from the in-memory index and forgets about the old one.
Squid has an extensive, but somewhat confusing, set of access controls. The most important thing to understand is the difference between ACL types, elements, and rules, and how they work together to allow or deny access.
Squid has about 20 different ACL types. These refer to certain
aspects of an HTTP request or response, such as the client's IP
src type), the origin server's hostname (the
dstdomain type), and the HTTP request method (the
An ACL element consists of three components: a type, a name, and one or more type-specific values. Here are some simple examples:
acl Foo src 188.8.131.52 acl Bar dstdomain www.cnn.com acl Baz method GET
The above ACL element named
Foo would match a request that comes from
the IP address 184.108.40.206. The ACL named
Bar matches a www.cnn.com URL.
Baz ACL matches an HTTP
GET request. Note that we are not allowing
or denying anything yet.
For most of the ACL types, an element can have multiple values, like this:
acl Argle src 220.127.116.11 18.104.22.168 22.214.171.124 acl Bargle dstdomain www.nbc.com www.abc.com www.cbs.com acl Fraggle method PUT POST
A multi-valued ACL matches a request when any one of the values
is a match. They use
OR logic. The
Argle ACL matches a request
from 126.96.36.199, from 188.8.131.52, or from 184.108.40.206. The
matches requests to NBC, ABC, or CBS web sites. The Fraggle ACL
matches a request with the methods
Now that you're an expert in ACL elements, its time to graduate
to ACL rules. These are where you say that a request is allowed
or denied. Access list rules refer to ACL elements by their names
and contain either the
deny keyword. Here are some
http_access allow Foo http_access deny Bar http_access allow Baz
It is important to understand that access list rules are checked
in order and that the decision is made when a match is found.
Given the above list, let's see what happens when a user from
220.127.116.11 makes a
GET request for www.cnn.com. Squid
allow Foo rule first. Our request matches the
Foo ACL, because the source address is 18.104.22.168, and the request
is allowed to proceed. The remaining rules are not checked.
How about a
PUT request for www.cnn.com from 22.214.171.124? The request
does not match the first rule. It does match the second rule,
however. This access list rule says that the request must be
denied, so the user receives an error message from Squid.
How about a
GET request for www.oreilly.com from 126.96.36.199? The
request does not match the first rule (
allow Foo). It does not
match the second rule, either, because www.oreilly.com is different
than www.cnn.com. However, it does match the third rule, because
the request method is
Of course, these simple ACL rules are not very interesting. The
real power comes from Squid's ability to combine multiple elements
on a single rule. When a rule contains multiple elements, each
element must be a match in order to trigger the rule. In other
words, Squid uses
AND logic for access list rules. Consider this
http_access allow Foo Bar http_access deny Foo
The first rule says that a request from 188.8.131.52
AND for www.cnn.com
will be allowed. However, the second rule
says that any other request from 184.108.40.206 will be denied. These
two lines restrict the user at 220.127.116.11 to visiting only the
www.cnn.com site. Here's an even more complex example:
http_access deny Argle Bargle Fraggle http_access allow Argle Bargle http_access deny Argle
These three lines allow the
Argle clients (18.104.22.168, 22.214.171.124, and 126.96.36.199)
to access the
Bargle servers (www.nbc.com, www.abc.com, and www.cbs.com), but
POST methods. Furthermore, the
Argle clients are not
allowed to access any other servers.
One of the common mistakes often made by new users is to write a
rule that can never be true. It is easy to do if you forget that
AND logic on rules and
OR logic on elements. Here is
a configuration that can never be true:
acl A 188.8.131.52 acl B 184.108.40.206 http_access allow A B
The reason is that a request cannot be from both 220.127.116.11
18.104.22.168 at the same time. Most likely, it should be written
acl A 22.214.171.124 126.96.36.199 http_access allow A
Then, requests from either 188.8.131.52 or 184.108.40.206 are allowed.
Access control rules can become long and complicated. When adding a new rule, how do you know where it should go? You should put more-specific rules before less-specific ones. Remember that the rules are checked in order. When adding a rule, go through the current rules in your head and see where the new one fits. For example, let's say that you want to deny requests to a certain site, but allow all others. It should look like this:
acl XXX www.badsite.net acl All src 0/0 http_access deny XXX http_access allow All
Now, what if you need to make an exception for one user, so that she can visit that site? The new ACL element is:
acl Admin 220.127.116.11
and the new rule should be:
http_access allow Admin XXX
but where does it go? Since this rule is more specific than the
deny XXX rule, it should go first:
http_access allow Admin XXX http_access deny XXX http_access allow All
If we place the new rule after
deny XXX, it will never even get
checked. The first rule will always match the request and she
will not be able to visit the site.
When you first install Squid, the access control rules will deny every request. To get things working, you'll need to add an ACL element and a rule for your local network. The easiest way is to write an source IP address ACL element for your subnet(s). For example:
acl MyNetwork src 192.168.0.0/24
Then, search through squid.conf for this line:
# INSERT YOUR OWN RULE(S) HERE TO ALLOW ACCESS FROM YOUR CLIENTS
After that line, add an
http_access line with an allow rule:
http_access allow MyNetwork
Once you get this simple configuration working, feel free to move on to some of the more advanced ACL features, such as username-based proxy authentication.
Unless you've been living under a rock, you're aware of the spam problem on the Internet. Spam senders used to take advantage of open email relays. These days, a lot of spam comes from open proxies. An open proxy is one that allows outsiders to make requests through it. If others on the Internet receive spam email from your proxy, your IP address will be placed on one or more of the various blackhole lists. This will adversely affect your ability to communicate with other Internet sites.
Use the following access control rules to make sure this never happens to you. First, always deny all requests that don't come from your local network. Define an ACL element for your subnet:
acl MyNetwork src 10.0.0.0/16
Then, place a deny rule near the top of your
that matches requests from anywhere else:
http_access deny !MyNetwork http_access ... http_access ...
While that may stop outsiders, it may not be good enough. It won't stop insiders who intentionally, or unintentionally, try to forward spam through Squid. To add even more security, you should make sure that Squid never connects to another server's SMTP port:
acl SMTP_port port 25 http_access deny SMTP_port
In fact, there are many well-known TCP ports, in addition to SMTP,
to which Squid should never connect. The default squid.conf
includes some rules to address this. There, you'll see a
Safe_ports ACL element that defines good ports. A
!Safe_ports rule ensures that Squid does not connect to any of
the bad ports, including SMTP.
Duane Wessels discovered Unix and the Internet as an undergraduate student studying physics at Washington State University.
O'Reilly & Associates published Squid: The Definitive Guide in January 2004.
Chapter 8, "Advanced Disk Cache Topics," is available free online.
For more information, or to order the book, click here.
Return to ONLamp.com.
Copyright © 2009 O'Reilly Media, Inc.