HTTP ProxiesIn my previous article, I introduced some of the benefits to be gained by using a proxy. In today's article, I'd like to concentrate on HTTP proxies. We'll take a look at some of the HTTP proxies available in the ports collection and which proxies are suited for which needs.
If you have any familiarity with HTTP proxies, your first thought is probably Squid, the excellent HTTP proxy. Since there are already many fine articles and tutorials on using and configuring Squid, I won't cover that product in this series. For those that are disappointed, I'll give you with a few URLs:
Squid is an example of a very configurable HTTP proxy that can scale into very large networks. This is great if you are an administrator of a very large network, but overkill if you simply want to surf safely from your FreeBSD box or enforce a policy on a small home network. Thinking as a user, what are some of the irritants that go along with web browsing? The following quickly come to mind:
Depending upon the web browser you use, some of these irritants can be dealt with directly. Others require you to install additional proxy software. Let's start by taking a look at some common browsers, then move onto complementary proxies.
As of this writing, these are the latest (non-forbidden) versions of three popular web browsers:
mozilla-1.3_1,2opera-6.12linux-netscape-navigator-4.8Keep in mind that new features are added with new versions, so features that are missing now may appear in later versions. Also, every web browser has a "Preferences" section, so if your browser isn't listed here, check it out to see what features are available.
For these browsers, the Preferences section is found under the Edit menu of Netscape and Mozilla, and under the File menu of Opera. You'll find a big difference in the amount of Preferences available between Netscape and Mozilla or Opera. This is because this is an older version of Netscape.
All three browsers have an appropriately named setting that allows you to deal with cookies. Each also allows you to enable or disable Java and JavaScript. Finally, if you have a slow Internet connection and plenty of disk space, you may find a speed improvement by tweaking each browser's cache settings.
Dealing with popup windows is a newer feature, so it is not found in this version of Netscape. In Opera, click on General to find the setting to disallow popups. Mozilla takes this a step further by either disabling popups entirely or on a site-by-site basis. To disable popups all together, go to Privacy & Security->Popup Windows and read the warning on the ramifications. Alternately, as you encounter a site with an irritating popup, simply right-click the page and choose to "Reject popup windows from this site."
bfilterNow, let's see what some of the applications in the ports collection can do
to augment the features already provided by your favorite web browser. I'll
start with bfilter.
This HTTP proxy not only controls popup windows, it also stops those annoying
flashing ads and promises to disable webbugs. To build this port, become the
superuser and:
# cd /usr/ports/net/bfilter
# make install clean
The port will install an application to /usr/local/bin/bfilter
and a configuration file to /usr/local/etc/bfilter/config. Once
the build is finished, leave the superuser account and type
bfilter in order to start the proxy. Then verify that the proxy is
listening for requests:
$ sockstat -4
USER COMMAND PID FD PROTO LOCAL ADDRESS FOREIGN ADDRESS
dlavigne bfilter 20336 3 tcp4 127.0.0.1:8080 *:*
You'll note that bfilter listens on port 8080 on the loopback
address. If you read the comments in its configuration file, you'll see that
127.0.0.1 means to listen for HTTP requests on all interfaces. If you wish to
listen only on one interface, specify its IP address in the configuration
file.
bfilter is not a transparent proxy, meaning you will have to
configure your web browser to use the proxy. Go into the Preferences section of
your browser and you should find a setting that deals with Proxies. Type in the
IP address and port number used by bfilter. In my example,
bfilter is running on the same machine as my web browser, so I use
127.0.0.1 as the IP address and 8080 as the port number. If you are running
bfilter on a separate computer, change the IP address in its
configuration file to reflect the IP address of the NIC attached to your
internal network. Then set the browsers on the computers in your network to
use that IP address in their Proxies section of Preferences.
bfilter also has a rules file, found in
/usr/local/etc/bfilter/rules. However, I found that the default
rules worked flawlessly at catching popup windows and flashing ads. If you're
looking for an easy-to-use proxy that works out of the box,
bfilter is a very nice solution.
middlemanAnother HTTP proxy I enjoy using is middleman.
Like bfilter, it works as is, but what makes this proxy
interesting are the additional features that provide an enticing way to learn
more about HTTP and what is happening behind the scenes every time you visit a
web site.
First, let's build the port:
# cd /usr/ports/www/middleman
# make install clean
Note that the name of the installed application will be
/usr/local/bin/mman. You also need to know the name of the
default configuration file in order to start the application. If you just type
mman, you'll receive the help file. Instead, use the
c or config-file switch to start the proxy:
# mman -c /usr/local/etc/mman.xml
I found that the proxy needs to be started as the superuser. Don't forget to
check the port mman is listening on and set the Proxies section of
your browser accordingly:
sockstat -4
USER COMMAND PID FD PROTO LOCAL ADDRESS FOREIGN ADDRESS
root mman 575 0 tcp4 127.0.0.1:8080 *:*
If you plan on using middleman, take the time to read
/usr/local/share/doc/middleman/README.html. This is the only
documentation on the product, but it is very thorough and full of interesting
ideas on how to use a proxy.
Although the default configuration will probably suit your needs, you should
check out the included web interface by typing mman into your
browser. This will allow you to view:
|
If you've never managed an HTTP server or HTTP proxy before, you may be amazed at the amount of interaction that occurs whenever a web browser connects to a web server. I mentioned in the last article that we would be referring to the HTTP RFC (2616). Let's do a very quick rundown on how the HTTP protocol works; I'll leave it to you to refer to the RFC to fill in the details that interest you.
Whenever you browse a web site, your browser must make a separate request for
every item on that page. For example, if I type slashdot.org into
my browser, I'll see the following entries in my mman cache:
Note that every GIF or image is a separate request, as each is stored as a
separate file on the web server. In order for my web browser to display the
main page of Slashdot's site, it had to individually request each of the 11
.gifs, the one .ico, and the HTML page that
explained how to format everything together.
In HTTP, there are two types of packets: request packets and response packets. The request packet always comes from the web browser. This makes sense, as a web browser is a client and the job of a client is to make requests. Not surprisingly, the response packets always come from the web server.
A web browser's request packet has three components:
The method indicates what the client is requesting. The methods are all
listed and explained in the RFC and typically are written in uppercase. The
most common method is the GET method, as typically your web
browser wants to "get" a particular page or image from the web browser. If you
take a look at your mman log, or for that matter, the log from any
HTTP proxy or HTTP server, you'll see GET requests:
Sat 21 16:04:43 [575] request: GET
http://images.slashdot.org:80/greendot.gif
Sat 21 16:04:43 [575] cache: create:
http://images.slashdot.org:80/greendot.gif
Sat 21 16:04:43 [575] request: GET http://images.slashdot.org:80/pix.gif
Sat 21 16:04:43 [575] cache: create: http://images.slashdot.org:80/pix.gif
Sat 21 16:04:43 [575] request: GET
http://images.slashdot.org:80/topics/topicgamesrts.gif
Sat 21 16:04:43 [575] cache: create:
http://images.slashdot.org:80/topics/topicgamesrts.gif
Sat 21 16:04:43 [575] request: GET
http://images.slashdot.org:80/topics/topiccomdex.gif
Sat 21 16:04:43 [575] cache: create:
http://images.slashdot.org:80/topics/topiccomdex.gif
<snip>
Here, mman issued the GET request on behalf of my
browser, then placed a copy of the requested item into its cache.
A web server's response packet also has three components:
That is, the request packet sends a method, and the web server responds with
a status message. Status messages are numerical, and again are listed in the
RFC. You've probably run across a "404 error," as 404 is the status
number representing "not found." The most common status is 200 or
OK. If a web browser issues a GET request and the
server finds the requested resource, it will send it back along with a status
of 200. If it can't find the requested file, it will instead send
a status of 404.
You probably noticed that both request and response packets contain headers
and a body. The body usually contains the requested page or image. So, when my
web browser made a GET request for
http://images.slashdot.org:80/greendot.gif, the web server found
the GIF and sent a response packet with a status of 200 and the
GIF itself in the body of that packet.
mmanHeaders are the interesting part of HTTP packets. They contain useful
information that help the web browser and web server to communicate
effectively. They also contain sensitive information about both the web server
and web browser. Here are the results of my clicking on Show Headers in
mman's web interface:
Unfiltered
Type Value
Host mman
User-Agent Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.3.1)Gecko/20030619
Accept text/xml,application/xml,application/xhtml+xml,text/html;
q=0.9,text/plain;q=0.8,video/x-mng,image/png,image/jpeg,image/gif;
q=0.2,*/*;q=0.1
Accept-Language en-us,en;q=0.5
Accept-Encoding gzip,deflate,compress;q=0.9
Accept-Charset ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive 300
Proxy-Connection keep-alive
Referer http://mman/headers
Filtered
Type Value
Host mman
Accept text/xml,application/xml,application/xhtml+xml,text/html;
q=0.9,text/plain;q=0.8,video/x-mng,image/png,image/jpeg,image/gif;
q=0.2,*/*;q=0.1
Accept-Language en-us,en;q=0.5
Accept-Encoding gzip,deflate,compress;q=0.9
Accept-Charset ISO-8859-1,utf-8;q=0.7,*;q=0.7
Referer http://mman/headers
User-Agent Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; Q312461)
Remember, every HTTP packet includes headers. Here you are seeing the
values that are sent by my web browser. The Unfiltered section
contains the defaults used by my web browser. It clearly shows my operating
system and the version and type of web browser I am using. The
Filtered section shows that mman changed some of
those headers before sending them to the web server. If I don't like those new
values, I can simply click on Config, select header,
and edit, say, the User-Agent. This configuration section is quite
powerful, as you can add, delete, and modify the contents of headers. Don't do
this just for kicks, however. Make sure you've read the RFC and understand the
ramifications of the particular header value you have the urge to muck about
with.
It's also interesting to see the headers being sent by a web server. If I type this URL into my browser and remember to use two periods between the word "headers" and the URL:
headers..www.mp3.com
I'll see this:
*Server header:*
HTTP/1.1 200 OK
Date: Sat, 21 Jun 2003 21:17:43 GMT
Server: Apache/1.3.12m1 (Unix) yasl/2.25 sw/1.7 mod_rdbcookie/1.2
mod_mp3idver/0.12 rwh/1.1 bw/3.37 rewrite/3.3 include/3.6
Connection: close
Transfer-Encoding: chunked
Content-Type: text/html
Notice that there aren't any secrets on the server end either. The header clearly indicates the type and version of web server software in use. If you are responsible for maintaining a web server, remember that every HTTP packet leaving your server reveals whether or not you've kept up with your web server patches!
mman also supports features that can be very useful in a
networked environment. One, it can force users to authenticate before they are
allowed to use the Internet. I'll click on config then select
access and add a policy. I'll then be presented with a form.
If I leave the IP address section empty, the access policy will affect every IP address that connects to the proxy. I can then set values in the username and password fields. Before saving the policy, I need to configure what access users will be allowed once they input the correct username and password. My choices are:
CONNECT requests: CONNECT is an HTTP
method that is often disabled due to its associated security
risks.mman has keywords that can be
included with an URL to bypass restrictions for a particular site. For example,
if I wanted to see the popups for a site, I could type this in my browser:
bypass[f]..www.mp3.com. If you don't want users bypassing your
filters, don't select this option.If you decide to create your own policy, remember to create a second policy
that will allow you as an administrator to configure mman. If you
plan on configuring mman on the same computer that is running the
proxy software, keep the default policy, but place it below your new policy
that affects your users.
Now, when users open up their web browsers, the browser itself will prompt them for the username and password you created in your policy. If they type it in correctly, they will be able to access the Internet, according to the parameters you set in your policy.
|
Also in FreeBSD Basics: |
The last feature I wish to mention is limits. This
configuration allows you to control Internet access according to month, day,
and time. For example, you could configure a policy that limits Internet
access to the hours of 9:00 to 17:00 on Monday to Friday.
It seems that I've barely scratched the surface of the middleman proxy
server. Perhaps I've piqued your interest and you will try this application for
yourself.
In the next article, I'd like to finish the proxy series by taking a look at SMTP proxies.
Dru Lavigne is a network and systems administrator, IT instructor, author and international speaker. She has over a decade of experience administering and teaching Netware, Microsoft, Cisco, Checkpoint, SCO, Solaris, Linux, and BSD systems. A prolific author, she pens the popular FreeBSD Basics column for O'Reilly and is author of BSD Hacks and The Best of FreeBSD Basics.
Read more FreeBSD Basics columns.
Return to the BSD DevCenter.
Copyright © 2009 O'Reilly Media, Inc.