ONJava.com    
 Published on ONJava.com (http://www.onjava.com/)
 See this if you're having trouble printing code examples


Another Java Servlet Filter Most Web Applications Should Have
Client-Side Cache Control

by Jayson Falkner
03/03/2004

There are several servlet filters that most web applications should have. A few of these filters are detailed in my previous article "Two Filters Every Web Application Should Have," but the theme to these types of filters is always the same — provide some type of functionality that will seriously help almost any web application. In the previous article, two filters were presented: a cache filter and a compression filter. In this article, we will code a filter that can modify HTTP response headers with the intention of using it to modify the client's web browser's cache. Client-side caching isn't as obvious as server-side caching, but it can be incredibly helpful, and it's near-trivial to implement.

Note: In this article, a nice user-friendly introduction to servlet filters is skipped, because you can find a perfectly good one with my previous article. Be sure you know what a servlet filter is before continuing, or else the code will make little sense.

HTTP Response Headers

The current HTTP specification is quite large and it is easy to only pay attention to the things that are forced upon you; e.g., you probably know what a URL is for index.jsp at www.jspbook.com (http://www.jspbook.com/index.jsp), but do you know how to type the HTTP request for the same resource? Did you know that a basic HTTP request is nothing more than plain text? Just for fun, here is what a basic HTTP 1.1 request looks like. Notice that it is typed using telnet.

# telnet www.jspbook.com 80
Trying 209.247.227.227...
Connected to www.jspbook.com.
Escape character is '^]'.
GET /index.jsp HTTP/1.1
host: foobar

After typing in the request (be sure to hit the Enter key an extra time), the web site www.jspbook.com returned the contents for /index.jsp, an HTML page.

HTTP/1.1 200 OK
Set-Cookie: JSESSIONID=285168DA4C5ACC1AF04EE3994F88BC15; Path=/
Content-Type: text/html
Transfer-Encoding: chunked
Date: Thu, 08 Jan 2004 03:12:26 GMT
Server: Apache-Coyote/1.1
2000
The HTML page's content was here but is skipped for brevity.

And that is how a web browser gets the content it uses to render a web page. Interesting if you haven't seen it before, but that is the greater point. If you haven't seen an HTTP request before, it is likely because you don't have to, when coding the average web application. The greater point being that the HTTP specification is full of all sorts of information, and the Servlet/JSP API doesn't necessarily reflect everything in the HTTP specification.

Some great examples of helpful things in the HTTP specification that new J2EE developers commonly don't know about are the HTTP headers that aren't abstracted by a method in either the javax.servlet.http.HttpServletRequest or the javax.servlet.http.HttpServletResponse class. Most developers know that an HTTP response can specify the length of its content and what type of content is being sent (i.e., MIME type). Both of these things correspond to HTTP response headers that are modified by the javax.servlet.http.HttpServletResponse object's setContentLength() and setContentType() methods, respectively — methods that directly abstract an HTTP response header. But what about all of the headers you can set using the HttpServletResponse object's setHeader() method? You'll have to look at the HTTP specification if you want to know these.

The following is the list of all the valid response headers as of HTTP 1.1.

You can click the above links to see what each response header is used for. We are going to specifically focus on the HTTP response header Cache-Control, which affects client-side content caching. Various uses of this header are helpful for most any web application, but it is not obvious that you can use them if you only look at the Servlet/JSP API documentation. Here is a key example: most web applications have a common graphic used on every page (say, your site's logo), and every request to your web site will require a user to download the common graphic. Why not have the client cache the graphic and save your server from having to transmit it for every single request? With the Cache-Control header you can do just this.

Caching Content in the Client's Browser

The Cache-Control HTTP header may be used to specify if and when a resource should be cached, and how long to consider a cache valid. Continuing the example above, imagine that we have a web site that uses the same logo (say, logo.png) on every page. In most cases, it makes a lot of sense to tell a user's browser that this graphic should be cached, which will hopefully benefit the user the next time they visit a page on your site. This is easily done by setting the Cache-Control header to have the value max-age=3600 in the HTTP response that transmits logo.png. The value public specifies that the content of this response is public information and that it should be cached by anything that can cache it. The value max-age=3600 specifies that the cache should be considered valid for 3600 seconds, which is 60*60 seconds, or an hour. For all requests up to an hour, unless the client forces the cache to be revalidated, the user's browser won't send a request to your server for logo.png; it will use the existing cached file. For one user, this is pretty neat -- the content-cached image will appear to have downloaded instantaneously. However, imagine the larger picture. If you use this technique as a standard practice, you can seriously cut down on the amount of HTTP requests hitting your server(s) just by making sure you are only sending a client the information they haven't already seen — imagine telling your boss you don't need that second server.

Hopefully, it should be clear that being able to manipulate HTTP headers is helpful; the Cache-Control header alone proves this point. The question now is, how do you do it? As mentioned earlier in the article, a simple servlet filter does the trick nicely. Here is the complete code for such a filter. The code is from the book support site for Servlets and JavaServer Pages; the J2EE Web Tier. You can also deploy a compiled version of the filter with your web app by putting jspbook.jar from http://www.jspbook.com/jspbook.jar in the WEB-INF/lib directory of your web application.

package com.jspbook;
import java.io.*;
import javax.servlet.*;
import javax.servlet.http.*;
import java.util.*;
public class ResponseHeaderFilter implements Filter {
  FilterConfig fc;
  public void doFilter(ServletRequest req,
                       ServletResponse res,
                       FilterChain chain)
                       throws IOException,
                              ServletException {
    HttpServletResponse response =
      (HttpServletResponse) res;
    // set the provided HTTP response parameters
    for (Enumeration e=fc.getInitParameterNames();
        e.hasMoreElements();) {
      String headerName = (String)e.nextElement();
      response.addHeader(headerName,
                 fc.getInitParameter(headerName));
    }
    // pass the request/response on
    chain.doFilter(req, response);
  }
  public void init(FilterConfig filterConfig) {
    this.fc = filterConfig;
  }
  public void destroy() {
    this.fc = null;
  }
}

The only thing the above filter is doing is setting HTTP response headers to match whatever initial parameters that were provided for the filter. The four lines of code that accomplish this are the following.

for (Enumeration e=fc.getInitParameterNames();
     e.hasMoreElements();) {
   String headerName = (String)e.nextElement();
   response.addHeader(headerName,
                 fc.getInitParameter(headerName));
}

Aside from the above code, the rest of the filter is nothing more than a bare-minimum implementation of the javax.servlet.Filter interface.

To use the above filter in a helpful manner, you will have to set appropriate initial parameters to change the HTTP headers in which you are interested. The following deployment would change the Cache-Control header to match the earlier example. If you are trying the code, add these lines to your WEB-INF/web.xml file.

<filter>
  <filter-name>
   ResponseHeaderFilter</filter-name>
  <filter-class>
   com.jspbook.ResponseHeaderFilter</filter-class>
  <init-param>
    <param-name>
     Cache-Control</param-name>
    <param-value>
     max-age=3600</param-value>
  </init-param>
</filter>

<filter-mapping>
  <filter-name>
   ResponseHeaderFilter</filter-name>
  <url-pattern>/logo.png</url-pattern>
</filter-mapping>

A web application using the given code for the filter and the given deployment above would have clients cache the logo.png file, and the web application would not have to serve the image to clients who had already downloaded it within the last hour. The code can be tested by deploying it with a real web application and logging the HTTP requests (if you container doesn't log requests, you could code a filter to do so). Here is a simple demonstration using the latest release of Tomcat (5.0.16), which conveniently provides a method to dump HTTP request/response information. If you have Tomcat installed, uncomment out the following line in /conf/server.xml to enable the RequestDumperValve class.

<Valve
 className="org.apache.catalina.valves.RequestDumperValve"/>

With the RequestDumperValve enabled, a new file will appear in Tomcat's /logs directory, named catalina_log.[today's date].txt, where [today's date] is replaced with whatever the current date is. The contents of this file will include all of the information about HTTP requests and responses that Tomcat handles, including the HTTP headers involved.

The final step to testing the response header filter is to browse to a resource in a web application with and without client-side caching and to look at the difference. You can certainly use any resource you like, but for this article, we will use the simplest thing possible. Save the following HTML page in your web application with whatever name you please (we'll assume test.html for this article). Recall that only logo.png is important, since our response header filter is mapped to it.

If you are using Tomcat, a convenient place to save this file is in the base directory of the ROOT web application.

<html>
 <head>
  <title>A Simple Page</title>
 </head>
 <body>
  <p>Some text, with a logo</p>
  <img src="logo.png">
 </body>
</html>

Next, save an image named logo.png in the same directory as the above HTML file. Any image will do. Finally, open your favorite web browser and browse to test.html; e.g., http://127.0.0.1/test.html. The page should render as expected with some text and a graphic. Check your HTTP request/response log to see what went on behind the scenes to make this happen. In Tomcat's log you'll note two instances of a HTTP GET: one for the content of test.html and one for the content of logo.png. For example, Tomcat's logs include the following:

2004-01-15 18:48:20 RequestDumperValve[Catalina]: REQUEST URI       =/test.html
...
===============================================================
2004-01-15 18:48:21 RequestDumperValve[Catalina]: REQUEST URI       =/logo.png
...
2004-01-15 18:48:21 RequestDumperValve[Catalina]: header=Cache-Control=max-age=3600

And, if you look closely at all the headers your container set for the response, you'll find that logo.png has the response header Cache-Control set with a value of max-age=3600.

So what does the above information mean? When your web browser retrieved the content for test.html there was a link to logo.png in the document, for which your browser then automatically made another HTTP request. In total, your web browser needed both test.html and logo.png to display the page, and it sent HTTP requests for both of those resources to your web application. Recall we are pretending that logo.png is an image that appears on all of your pages in the web application, and we are trying to use a HTTP header to have the web browser cache logo.png so that it is only downloaded once. The previous request will have done exactly that, because we set the Cache-Control header. Now you can test if the browser successfully cached the image by browsing to another page in the web application that uses logo.png; we'll just reuse test.html. Browse to test.html again. (Don't use the Refresh button!) Again, check the HTTP request/response log your container generates. This time notice there is only an HTTP request for test.html In Tomcat's log, some new lines are appended that include the following:

===============================================================
2004-01-15 18:50:10 RequestDumperValve[Catalina]: REQUEST URI       =/test.html
...

No HTTP request is made for logo.png. This is because we told your web browser to cache logo.png locally for an hour. Go ahead, try browsing to the page again within the hour and you'll notice the same results. You can even try browsing to any other page (try making one up) that uses logo.png, and the local cache will still be used until it expires. After an hour passes, you can browse back to test.html and once again you'll see logo.png retrieved from the server once and cached for another hour.

Before moving on, let's tie up one loose end. I explicitly said that you should not use your browser's Refresh button to browse back to test.html. Understanding why I said this is important so that there no confusion about how things are being cached. HTTP has a good system for caching content. Similar to how we told the web browser to cache logo.png, the web browser can explicitly choose not to use its cache and the web browser can even try to have the server refresh its own cache. A web browser's Refresh button is almost always the shortcut that causes this. If you are trying this example and you use the Refresh button to revisit web application resources, you will likely notice that client-side caching just doesn't seem to work.

Ensuring Content is Not Cached by the Client's Browser

The HTTP response filter is not only good for having a client cache content. It is equally helpful for having a web browser not cache content, and the technique is as simple as the previous use of the Cache-Control header. Instead of setting the max-age value to an hour, try setting it to zero. This will mean the content is immediately invalid, which in practice will cause a web browser to invalidate its cache. However, the more technically correct method of forcing a client's browser not to cache information is to set the Cache-Control header's value to no-cache. You may also use the private value to specify that the HTTP content should not be cached in any public cache. You can even use the no-store value to ensure that the content is removed from memory as quickly as possible, so that there is little chance it would ever appear in something such as a tape backup of the server.

The following deployment would use all of these values to ensure a HTTP-1.1-compliant web browser doesn't cache content.

<filter>
  <filter-name>
   ResponseHeaderFilter</filter-name>
  <filter-class>
   com.jspbook.ResponseHeaderFilter</filter-class>
  <init-param>
    <param-name>Cache-Control</param-name>
    <param-value>
      private,no-cache,no-store</param-value>
   </init-param>
</filter>

Mapping the declaration above to resources in a web application will ensure they are not cached by web browsers. If you are really concerned about HTTP 1.0 browsers, you can look into also setting the Expires header and Pragma header, which are the old way of accomplishing the same thing. If you would like to test the HTTP response filter's ability to prevent caching, try it out using the same techniques we did earlier in this article. Instead of seeing the requests prevented, you'll see the cache-preventing headers set and a request for every resource every time you browse to the page. We won't walk through such a test in this article, but it should be a straightforward exercise if you wish to do it.

Does Client-Side Cache Manipulation Really Help?

Client-side cache manipulation absolutely helps, and it is something that can benefit most web applications you make. In some situations, you need to ensure that a web browser doesn't cache content; say, an instance where sensitive information is being passed to a web browser. Using HTTP headers is your only good method to accomplish this. However, the example of preventing a cache isn't nearly as interesting as having a client cache information. A key part of building an efficient web application is in getting content to a client as quickly as possible and with as little burden on your server(s) as possible. Client-side caching is ideal for this. We looked at the example of caching common graphics (e.g., your company's logo) that appear either at the top or bottom of every web page. However, don't think this technique only works for graphics. You could also have a client's browser cache style sheets, script files, or any other resource your web application uses. Additionally, you can benefit from briefly caching dynamic content. We used an example where the cache was valid for an hour. Why not set the cache to expire in five minutes and apply the filter to every resource in your web application that slowly changes (e.g., news feeds or directory pages)? Being able to control the HTTP caching mechanism is a very helpful tool to have, and it can be as simple to accomplish as using the filter presented in this article.

Related Reading

Java Servlet & JSP Cookbook
By Bruce W. Perry

Before moving from the topic of HTTP caching, it is only fair to point out that a gray area exists between the two extremes of caching on the server side and caching on the client side. In my previous article, the benefits of caching on the server side were introduced. In this article, the benefits of using a client-side cache were introduced. However, HTTP provides several other caching opportunities, some of which are done automatically via HTTP headers. There are two sets of HTTP headers in particular that are worth mentioning, as they fall in this gray area of caching. The first set has to do with keeping track of when content was generated. By default, most web browsers and web servers take advantage of the if-modified-since HTTP request header to keep track of how current content is and to cache content when possible. The process is simple, and works as follows: when a browser first retrieves content from a server, a timestamp is generated. On subsequent requests, the browser requests content but also uses the if-modified-since header to indicate an older version of the content is cached. A HTTP server, upon receiving such a request, can then check if the content has changed since the browser last saw it. If so, new content is sent (the HTTP 200 response). If not, the server sends back a short response (HTTP 304) indicating that the old content should be reused. When everything is over, if the browser's cache is valid, the content is not resent by the server. However, this scheme does not prevent any HTTP requests from occurring; it merely reduces the amount of information that need be sent per request. This significantly differs from the Cache-Control header that was used earlier in the article. Using the Cache-Control header can prevent an HTTP request from ever being needed: the browser already knows the content is good, there is no need to check a timestamp against the server. This difference is significant because an HTTP server can only handle the processing of a certain number of requests, regardless of how much content is being returned per request. If you are expecting to get optimal performance from your web server, it is important to avoid unneeded HTTP requests.

The second gray area is that of HTTP 1.0 cache control. The Cache-Control header is something new as of HTTP 1.1. In HTTP 1.0, you could get the same effect, but you would have to use the Pragma and Expires headers. The Pragma header with a value of no-cache works same as the Cache-Control header with a value of no-cache. Note that in HTTP 1.1, you should no longer use the Pragma header for this purpose; the Cache-Control header is the official replacement. But it might prove handy to use the Pragma header if you are restricted to HTTP 1.0. The Expires header works in a similar way to the Cache-Control header's max-age directive. In HTTP 1.0, you could set the Expires header with a date (optionally, a date before the current date) to signify if content should be considered a valid cache or if a cache should be explicitly reset. As with the Pragma header, the Expires header's use for cache control is intended to be replaced with the Cache-Control header, but it is handy to know about the Expires header if you are working with HTTP 1.0.

Summary and Conclusion

HTTP headers are helpful. The Servlet API lets you manipulate any HTTP header, but it is a poor place to learn about all of the HTTP headers you can manipulate. Realize that you can change HTTP headers to make your web application work better, and use the latest HTTP specification to determine what HTTP headers are helpful for you to use. In this article, we took a specific look at the HTTP response Cache-Control header. This header is helpful for caching things on the client side (saving your server some work) and/or ensuring content is not cached on the client side (making sure a web browser has the latest version of your content).

Take the HTTP response filter that was provided in this article, and use it to aid in your web application development. You have the entire source code, and you may modify it as you see fit. Or, if you simply want to drop a .jar file into the WEB-INF/lib directory of your web application and start deploying the filter, you may get the appropriate .jar file at http://www.jspbook.com/jspbook.jar. The code is actively maintained, and if you like the example, be sure to take a look at my book Servlets and JavaServer Pages; the J2EE Web Tier. It covers up to the latest JSP and Servlet specifications and provides many more helpful code examples for you to use.

Related Links

Code

Jayson Falkner is a J2EE developer, student, and webmaster of JSP Insider.


Return to ONJava.com.

Copyright © 2009 O'Reilly Media, Inc.