ONJava.com -- The Independent Source for Enterprise Java
oreilly.comSafari Books Online.Conferences.

advertisement

AddThis Social Bookmark Button

Introducing Automatic Data Expiration
Pages: 1, 2, 3, 4

A Second Solution to the Synchronization Bottleneck

Another solution to the synchronization bottleneck in the original version of GlobalCache is to change the role of the background thread slightly. In this solution, the background thread makes a copy of the data structure instead of directly checking the main data structure, and then uses the copy to check for expirations. This is exactly what the Tomcat 4.0 codebase does for session keys. Here's the code from Tomcat (from the classes org.apache.catalina.session.StandardManager and org.apache.catalina.session.ManagerBase).



From org.apache.catalina.session.StandardManager

    /**
     * The background thread that checks for session timeouts and shutdown.
     */
  public void run() {

        // Loop until the termination semaphore is set
    while (!threadDone) {
      threadSleep();
      processExpires();
    }
  }

  private void processExpires() {

    long timeNow = System.currentTimeMillis();
    Session sessions[] = findSessions();

    for (int i = 0; i < sessions.length; i++) {
      StandardSession session = (StandardSession) sessions[i];
      if (!session.isValid())
        continue;
      int maxInactiveInterval = session.getMaxInactiveInterval();
      if (maxInactiveInterval < 0)
        continue;
      int timeIdle = // Truncate, do not round up
        (int) ((timeNow - session.getLastAccessedTime()) / 1000L);
      if (timeIdle >= maxInactiveInterval) {
      try {
        session.expire();
      } catch (Throwable t) {
        log(sm.getString("standardManager.expireException"), t);
      }
    }
  }


From org.apache.catalina.session.ManagerBase


  /**
   * Return the set of active Sessions associated with this Manager.
   * If this Manager has no active Sessions, a zero-length array is returned.
   */

  public Session[] findSessions() {
    Session results[] = null;
    synchronized (sessions) {
      results = new Session[sessions.size()];
      results = (Session[]) sessions.values().toArray(results);
    }
    return (results);
  }

  /**
   * The set of currently active Sessions for this Manager, keyed by
   * session identifier.
   */
  protected HashMap sessions = new HashMap();

This is almost identical to the first version of GlobalCache we wrote. There's only one slight difference -- the method findSessions makes a duplicate copy of the main data structure, copying the values from the HashMap into an array that the background thread then uses to expire objects. Tomcat's approach solves the synchronization bottleneck: by making a second copy of the array and iterating over it, the Tomcat codebase avoids concurrent modification exceptions without locking the cache for an extended period of time.

The Tomcat approach does have some efficiency problems: copying all of the values in a HashMap is not cheap, and Tomcat still iterates over all the sessions. Moreover, making a duplicate copy of the session keys also actually makes the data inconsistency problem slightly worse: it's possible for a request to come in and use a session at the same time that the background thread is expiring the session. The scenario from the point of view of the user looks like this:

  • The user goes to get coffee (or lunch or ...). The time approximately required to expire a session goes by without the user making a request.
  • The user comes back to her desk and clicks a URL. That request succeeds.
  • The user clicks another URL and is told the session has timed out.

This example illustrates a fairly general problem with expiring data. Most approaches force you to choose between a synchronization bottleneck or a data consistency problem. The ideal sequence of operations for the background thread should perform something like the following sequence of operations:

  1. Wake up.
  2. Lock the main data structure for an incredibly brief period of time, in which the thread removes only the expired data.
  3. Unlock the main data structure. Because the main data structure is locked for such a brief period of time, there is no synchronization bottleneck.
  4. Expire the individual pieces of data in the background thread. Because the data was already removed from the main data structure, there is no data consistency problem.
  5. Sleep.

None of our solutions do this (yet), although Tomcat and the TreeSet version of GlobalCache are pretty nice.

Summary and Desiderata for a Better Data Structure

In this article, we've discussed the types of data that need to be expired and gone over several different algorithms for expiring data. As a result of this discussion, we've come up with three candidate solutions:

  • High discipline. A solution based on having complex subclasses of ExpirableObject. This can be made to work, but often just results in brittle and complex code.
  • Using a global cache with a time-based secondary index. In our case, we used an instance of TreeSet as the index. This made expiration fast and easy, and completely eliminated both the data consistency and synchronization bottlenecks. On the other hand, both adding objects and resetting object expiration times were much slower because of the cost of maintaining the index.
  • Using a Tomcat-style global cache. This is pretty nice. The main threads are only blocked for short periods of time and access to the main cache is based on logical keys. It's mildly inefficient and it only has a slight data inconsistency problem, but it still scales to the enterprise quite well.

Looking at our various implementations gives us set of requirements for a solution:

  • Use a single thread. Our first attempt used an unbounded number of threads. We want to avoid that.
  • Use a global cache. We need some way to access the information without already having a pointer to the encapsulating object.
  • The hard part should be in the framework. Expiring objects should be as easy as possible. A programmer should to be able to expire any instance of any class, without needing to adapt the class.
  • Consistency and Speed. We want them both. We want the speed and convenience of hash-based lookup. And we want to get rid of data inconsistency problems.
  • Reasonable indexing. If there isn't an obvious owner for an object, then clients will need to obtain references using an indexing scheme. The indexing used has to be programmatically reasonable, and that means we have to support logical indexing (not just time-based indexing).
  • Universality. Whatever solution we come up with should be useful in all of our example cases (not just for session keys and not just for short-lived information).

Related Reading

Java RMIJava RMI
By William Grosso
Table of Contents
Index
Sample Chapter
Full Description

Looking at our attempted solutions also lets us realize that one of our initial assumptions has been relaxed: using a single background thread that wakes up occasionally implies that the data can stay valid for an "extra" minute or two. Data expiration is often a casual requirement: the important thing is that the data lives for a while and then is expired at approximately the right time.

In the second article in this series, we'll use these requirements to define and build a new data structure for data expiration.

William Grosso is a coauthor of Java Enterprise Best Practices.

Related Articles by William Grosso

Learning Command Objects and RMI

Seamlessly Caching Stubs for Improved Performance

Generics and Method Objects


Return to ONJava.com.