BSD DevCenter
oreilly.comSafari Books Online.Conferences.

advertisement


HTTP Proxies
Pages: 1, 2

A Bit About HTTP

If you've never managed an HTTP server or HTTP proxy before, you may be amazed at the amount of interaction that occurs whenever a web browser connects to a web server. I mentioned in the last article that we would be referring to the HTTP RFC (2616). Let's do a very quick rundown on how the HTTP protocol works; I'll leave it to you to refer to the RFC to fill in the details that interest you.



Whenever you browse a web site, your browser must make a separate request for every item on that page. For example, if I type slashdot.org into my browser, I'll see the following entries in my mman cache:

  • http://images.slashdot.org:80/topics/topicgamesrts.gif
  • http://images.slashdot.org:80/topics/topicinternet.gif
  • http://images.slashdot.org:80/title.gif
  • http://slashdot.org:80/
  • http://images.slashdot.org:80/topics/topicaposx.gif
  • http://images.slashdot.org:80/topics/topicms.gif
  • http://images.slashdot.org:80/topics/topiccomdex.gif
  • http://images.slashdot.org:80/slc.gif
  • http://images.slashdot.org:80/pix.gif
  • http://images.slashdot.org:80/topics/topicscience.gif
  • http://images.slashdot.org:80/greendot.gif
  • http://slashdot.org:80/favicon.ico
  • http://images.slashdot.org:80/topics/topichardware.gif

Note that every GIF or image is a separate request, as each is stored as a separate file on the web server. In order for my web browser to display the main page of Slashdot's site, it had to individually request each of the 11 .gifs, the one .ico, and the HTML page that explained how to format everything together.

In HTTP, there are two types of packets: request packets and response packets. The request packet always comes from the web browser. This makes sense, as a web browser is a client and the job of a client is to make requests. Not surprisingly, the response packets always come from the web server.

A web browser's request packet has three components:

  • Method
  • Header
  • Body

The method indicates what the client is requesting. The methods are all listed and explained in the RFC and typically are written in uppercase. The most common method is the GET method, as typically your web browser wants to "get" a particular page or image from the web browser. If you take a look at your mman log, or for that matter, the log from any HTTP proxy or HTTP server, you'll see GET requests:

Sat 21 16:04:43 [575] request: GET
http://images.slashdot.org:80/greendot.gif
Sat 21 16:04:43 [575] cache: create:
http://images.slashdot.org:80/greendot.gif
Sat 21 16:04:43 [575] request: GET http://images.slashdot.org:80/pix.gif
Sat 21 16:04:43 [575] cache: create: http://images.slashdot.org:80/pix.gif
Sat 21 16:04:43 [575] request: GET
http://images.slashdot.org:80/topics/topicgamesrts.gif
Sat 21 16:04:43 [575] cache: create:
http://images.slashdot.org:80/topics/topicgamesrts.gif
Sat 21 16:04:43 [575] request: GET
http://images.slashdot.org:80/topics/topiccomdex.gif
Sat 21 16:04:43 [575] cache: create:
http://images.slashdot.org:80/topics/topiccomdex.gif
<snip>

Here, mman issued the GET request on behalf of my browser, then placed a copy of the requested item into its cache.

A web server's response packet also has three components:

  • Status
  • Headers
  • Body

That is, the request packet sends a method, and the web server responds with a status message. Status messages are numerical, and again are listed in the RFC. You've probably run across a "404 error," as 404 is the status number representing "not found." The most common status is 200 or OK. If a web browser issues a GET request and the server finds the requested resource, it will send it back along with a status of 200. If it can't find the requested file, it will instead send a status of 404.

You probably noticed that both request and response packets contain headers and a body. The body usually contains the requested page or image. So, when my web browser made a GET request for http://images.slashdot.org:80/greendot.gif, the web server found the GIF and sent a response packet with a status of 200 and the GIF itself in the body of that packet.

Displaying Headers with mman

Headers are the interesting part of HTTP packets. They contain useful information that help the web browser and web server to communicate effectively. They also contain sensitive information about both the web server and web browser. Here are the results of my clicking on Show Headers in mman's web interface:

Unfiltered
Type		Value
Host		mman
User-Agent	Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.3.1)Gecko/20030619
Accept		text/xml,application/xml,application/xhtml+xml,text/html;
	q=0.9,text/plain;q=0.8,video/x-mng,image/png,image/jpeg,image/gif;
	q=0.2,*/*;q=0.1
Accept-Language	en-us,en;q=0.5
Accept-Encoding	gzip,deflate,compress;q=0.9
Accept-Charset	ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive	300
Proxy-Connection keep-alive
Referer		http://mman/headers

Filtered
Type		Value
Host		mman
Accept		text/xml,application/xml,application/xhtml+xml,text/html;
	q=0.9,text/plain;q=0.8,video/x-mng,image/png,image/jpeg,image/gif;
	q=0.2,*/*;q=0.1
Accept-Language	en-us,en;q=0.5
Accept-Encoding	gzip,deflate,compress;q=0.9
Accept-Charset	ISO-8859-1,utf-8;q=0.7,*;q=0.7
Referer		http://mman/headers
User-Agent	Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; Q312461)

Remember, every HTTP packet includes headers. Here you are seeing the values that are sent by my web browser. The Unfiltered section contains the defaults used by my web browser. It clearly shows my operating system and the version and type of web browser I am using. The Filtered section shows that mman changed some of those headers before sending them to the web server. If I don't like those new values, I can simply click on Config, select header, and edit, say, the User-Agent. This configuration section is quite powerful, as you can add, delete, and modify the contents of headers. Don't do this just for kicks, however. Make sure you've read the RFC and understand the ramifications of the particular header value you have the urge to muck about with.

It's also interesting to see the headers being sent by a web server. If I type this URL into my browser and remember to use two periods between the word "headers" and the URL:

headers..www.mp3.com

I'll see this:

*Server header:*

HTTP/1.1 200 OK
Date: Sat, 21 Jun 2003 21:17:43 GMT
Server: Apache/1.3.12m1 (Unix) yasl/2.25 sw/1.7 mod_rdbcookie/1.2
	mod_mp3idver/0.12 rwh/1.1 bw/3.37 rewrite/3.3 include/3.6
Connection: close
Transfer-Encoding: chunked
Content-Type: text/html

Notice that there aren't any secrets on the server end either. The header clearly indicates the type and version of web server software in use. If you are responsible for maintaining a web server, remember that every HTTP packet leaving your server reveals whether or not you've kept up with your web server patches!

Controlling Access

mman also supports features that can be very useful in a networked environment. One, it can force users to authenticate before they are allowed to use the Internet. I'll click on config then select access and add a policy. I'll then be presented with a form.

If I leave the IP address section empty, the access policy will affect every IP address that connects to the proxy. I can then set values in the username and password fields. Before saving the policy, I need to configure what access users will be allowed once they input the correct username and password. My choices are:

  • Web interface: This will allow users to configure the proxy, so I will leave this option unchecked.
  • Proxy requests: If I check this option, the proxy will accept requests from web browsers that have been manually configured to use the IP address and port number of the proxy.
  • CONNECT requests: CONNECT is an HTTP method that is often disabled due to its associated security risks.
  • HTTP requests: I want to remember to select this option, or users won't be able to access HTTP servers.
  • Transparent proxying: If I check this option, the proxy will intercept web requests, even if the web browser hasn't been configured to use the proxy. This is generally a good thing in a network, as it ensures users won't be able to bypass your proxy server.
  • Allow bypassing: mman has keywords that can be included with an URL to bypass restrictions for a particular site. For example, if I wanted to see the popups for a site, I could type this in my browser: bypass[f]..www.mp3.com. If you don't want users bypassing your filters, don't select this option.

If you decide to create your own policy, remember to create a second policy that will allow you as an administrator to configure mman. If you plan on configuring mman on the same computer that is running the proxy software, keep the default policy, but place it below your new policy that affects your users.

Now, when users open up their web browsers, the browser itself will prompt them for the username and password you created in your policy. If they type it in correctly, they will be able to access the Internet, according to the parameters you set in your policy.

Also in FreeBSD Basics:

Fun with Xorg

Sharing Internet Connections

Building a Desktop Firewall

Using DesktopBSD

Using PC-BSD

The last feature I wish to mention is limits. This configuration allows you to control Internet access according to month, day, and time. For example, you could configure a policy that limits Internet access to the hours of 9:00 to 17:00 on Monday to Friday.

Conclusion

It seems that I've barely scratched the surface of the middleman proxy server. Perhaps I've piqued your interest and you will try this application for yourself.

In the next article, I'd like to finish the proxy series by taking a look at SMTP proxies.

Dru Lavigne is a network and systems administrator, IT instructor, author and international speaker. She has over a decade of experience administering and teaching Netware, Microsoft, Cisco, Checkpoint, SCO, Solaris, Linux, and BSD systems. A prolific author, she pens the popular FreeBSD Basics column for O'Reilly and is author of BSD Hacks and The Best of FreeBSD Basics.


Read more FreeBSD Basics columns.

Return to the BSD DevCenter.



Sponsored by: