Linux DevCenter    
 Published on Linux DevCenter (http://www.linuxdevcenter.com/)
 See this if you're having trouble printing code examples


Peering Squid Caches

by Jennifer Vesperman
09/17/2001

A Squid cache can be set to check other Squid servers (its peers) for cached web pages before going direct to the requested page. Peering your Squid caches can provide faster responses and lower costs. Cache peers exchange cached objects, returning them to the users faster and reducing upstream bandwidth costs.

Sources

Squid first looks for requested objects in its own cache. If the object is not in the cache, then Squid must retrieve it from another source. To oversimplify the algorithm: Squid creates an ordered list of possible sources.

This list consists of parent(s), sibling(s), and the origin server(s). Squid attempts to fetch the object from each server in order until it is successful, one of the servers responds with a 404 (object not found), or there are no more sources to try.

Access controls and ICP queries modify the list using these rules:

Communicating between peers

Peers need to tell each other which objects are in their caches. ICP and HTCP allow instant queries and responses. Digests are snapshots of what is currently in the cache.

ICP and HTCP

Squid usually uses the ICP protocol to communicate between caches, and can use HTCP if both caches are configured for it. ICP and HTCP are similar enough that, unless noted otherwise, the article uses ICP to describe both.

Related articles:

Authentication and Squid

Using Squid on Intermittent Connections

Installing and Configuring Squid

Comment on this articleThere are lots of ways to utilize multiple cache servers. Tell us what is working for you.
Post your comments

When Squid receives a request for an object that is not in its cache, it sends an ICP query packet to each of its configured peers. If at least one peer says it has the object, Squid requests the object from the fastest of these peers. If all the peers answer "no" or fail to respond before a timeout setting, Squid requests the object from the origin or fails, depending on never_direct settings.

ICP uses untracked UDP packets. Because untracked UDP packets may be lost, dropped, or damaged, Squid uses a timeout value. Any queries which are not responded to in this time are assumed to be lost, and Squid drops those peers from its list for this request.

Squid uses a dynamic ICP timeout by default, but this can be overridden with icp_query_timeout or capped with icp_maximum_query_timeout if you find that the values that Squid calculates are suboptimal for your environment.

Digests

A cache digest is a block of URI keys which represents the set of objects the cache holds. The URIs are run through a deterministic algorithm which compacts the data and makes the keys. Digests are exchanged with digest-capable peers, and are used to determine whether a peer is likely to have a requested object.

Digests are subject to false hits and false misses, depending on the frequency of the exchange, but they reduce the immediate network traffic and are useful if the bandwidth between peers is narrow or unreliable.

Configuring a parent cache

Use a parent cache if you want to reduce your upstream costs and make page collection faster for your users, especially if the users are in groups which can have child caches.

Tell your cache-clients the cache's hostname, HTTP and ICP ports, and why they should be using it. A knowledgeable user is more likely to use the cache.

CAVEAT: The more caches you have downstream of you, the lower your hit-rate as a parent. Caches downstream of you will cache what they can, and pass up requests only for content that they do not have, or that is hard to cache. Every hit is still a benefit, even if the rate is low.

If your bandwidth is paid by the byte, you'll find that even a low hit-rate will cover the hardware and operating expenses.

Configuring a child cache

Some ISPs give a cost reduction if you use their parent. Parent caches may contain the page you're about to want, providing faster service.

You will need:

For each parent you have, add a line like this to your squid.conf:
cache_peer hostname parent HTTP_port ICP_port [OPTIONS]
eg: cache_peer proxy.cache.example.org parent 3128 3130 no_query no-digest

Related Reading

Web CachingWeb Caching
By Duane Wessels
Table of Contents
Index
Sample Chapter
Full Description
Read Online -- Safari

If you have one parent, use the no-query and no-digest configuration options. If you're always going to request the object from the parent, there's no need to check for the object.

If you have multiple parents, ICP queries can improve performance. An example with two parents:

  1. One parent says yes, the other says no. We use the one that said yes.

  2. Both parents said yes. They both have it, so use the one that answered first. It's likely to be fastest.

  3. Both parents said no. Again, whoever answered first.

  4. One parent failed to respond. We go with the other one. The parent that didn't answer may have failed or be temporarily unreachable.

CAUTION: When a peer fails to respond to ICP queries in dead_peer_timeout seconds, Squid assumes it is unavailable or unreachable until it sees another ICP response from that peer. While a peer is in this "presumed dead" state, Squid will send ICP queries, but won't wait for it to answer. Squid will base decisions on the responses of the "live" peers.

If all your parents are "dead" according to this test and Squid is not configured to go direct, Squid will not be able to return objects that are not in its cache.

Configuring a sibling cache

Sibling caches work well where you have groups of users. Each group's local proxy shares the cache with other proxies, only "going forward" to a parent or origin server if none of the caches have a fresh copy of the requested object.

Use sibling caches if several caches are behind some sort of bottleneck but have good connections to each other.

Siblings can group together to allow several smaller computers to simulate one expensive computer and serve as a larger proxy.

To allow another cache to use you as a sibling, configure your cache as if it were a parent cache, but instead of giving MISS access, deny it to your siblings.
acl sibling1 src 192.168.44.55/255.255.255.255
miss_access deny sibling1

To use another cache as a sibling, both caches must support either ICP/HTCP or cache digests. These allow Squid to check for objects in other caches.
The cache_peer entry looks like this:
cache_peer hostname sibling HTTP_port ICP_port [OPTIONS]
cache_peer sibling1.myinternalnet.org sibling 3128 3130 proxy-only

Note the proxy-only option. Normally, caching objects fetched from a sibling is a waste of disk space. If the bandwidth to a sibling is narrow, lossy or expensive, consider leaving the option out and caching objects from that sibling.

Caveats and gotchas

FALSE HITS: (ICP only) Because ICP does not communicate request headers (only the URI is presented in an ICP query), it is possible for a peer to return an affirmative for a given URI but not be able to satisfy the request from cache.

HTCP incorporates the request headers into the query packet, and is thus almost immune to false hits -- although they are still theoretically possible under rare circumstances.

Further reading

Jennifer Vesperman is the author of Essential CVS. She writes for the O'Reilly Network, the Linux Documentation Project, and occasionally Linux.Com.


Return to the Linux DevCenter.

Copyright © 2009 O'Reilly Media, Inc.