ONJava.com -- The Independent Source for Enterprise Java
oreilly.comSafari Books Online.Conferences.


AddThis Social Bookmark Button

The Hashbelt Data Structure
Pages: 1, 2, 3, 4, 5

The first change we're going to make is simple: instead of session objects that contain an expiration time, we're going to allow any type of object to be stored. The reason is simple: a caching system for time-sensitive information shouldn't require the objects it stores to understand timestamps and know when they are out of date. Tomcat gets away with using session objects (which store their own timestamps) because it's a single application. We need to be more general, because we're building a framework that can be reused to expire all sorts of data. Later on, when we discuss the code, we're going to define an interface, named ExpirationSystem, that encapsulates exactly what our data expiration algorithms are allowed to assume about the data being stored; for now, it suffices to say that programmers must be able to store any type of object.

Since the objects stored in an instance of ExpirationSystem no longer contain any information about when they expire, we need to store the expiration times for the objects somewhere within our framework. The obvious way to do this is to have an instance of Hashtable, which maps objects to some sort of timestamp (for example, to instances of Date). Instead of doing this, however, we're going to break apart the global cache and store objects in separate containers, based on how long they have to live. More precisely, we're going to do the following:

  • Define a container interface called HashbeltContainer. The idea behind the interface is that we'll have different implementations of our basic container, optimized for different uses.

  • Use many instances of HashbeltContainer, each one of which will only contain objects of approximately the same age. Each object stored in the global cache will be stored in exactly one container.

Once nice consequence of doing things this way is that we'll be able to tell how old an object is, or how close it is to being expired, simply by looking at what container it's in.

Pictorially, we've just performed the following transformation:

Figure 1. The cache is a set of containers based on how much time objects have before they expire. This structure is not reflected in the public interface

Of course, using multiple containers instead of a single instance of Hashtable means that the hashbelt's lookup algorithm is going to be slightly different from Tomcat's. In particular, it's going to iterate over all of the instances of HashbeltContainer. Attempting to fetch an object from the expiration system will still look the same to client code; it will involve a call to a get method. But inside the expiration system, instead of being a single lookup, the implementation of get will iterate over all of the containers, starting with the most recent and stopping when the object has been found. This algorithm is illustrated in figure 2.

Diagram - click for full-size view.
Figure 2. Implementing a query (click for full size view).

Now that we've changed how objects are stored, we also need to change the way objects are expired. Hashbelt still uses a background thread for object expiration. However, the actual expiration algorithm is quite different from Tomcat's. In a hashbelt system, the background thread performs the following tasks:

  1. The thread wakes up.
  2. The thread creates a new container.
  3. The thread locks the main data structure and then "rotates" the set of containers. The newly-created container goes into the first position, and the last container is removed from the data structure entirely. This "rotation" is the "conveyor belt" I mentioned earlier.
  4. The thread unlocks the main data structure and then iterates over the removed container, performing some sort of expiration operation for each object it finds.
  5. The thread goes to sleep for a brief period of time.

This sequence of operations is illustrated in figure 3.

Figure 3. How containers are expired.

At this point, we've covered how a hashbelt stores objects (in multiple containers, sorted by expiration time) and we've covered the basic "rotate and expire" algorithm. However, in the hashbelt algorithm as defined so far, there are some very compelling questions that we don't know the answer for. Namely:

  • What do we do with an object once it's been retrieved?
  • What do we do with an object when it's been removed from the data structure?
  • How do we expire an arbitrary object anyway?

Unfortunately, there are no universal answers to these questions. For example, consider session keys in a servlet engine. Session keys answer these questions in the following way:

  • What do we do with a session key once it's been retrieved? We remove it from the container we found it in, and stick it in the container with the most recent timestamp (this effectively postpones the expiration of the session).
  • What do we do with an object when it's been removed from the data structure? We actively expire it.
  • How do we expire a session? We remove any resources associated to the session that may have been allocated. In addition, we probably record the end of the session in some persistent storage mechanism.

On the other hand, consider weather reports from Half Moon Bay. The answers are completely different:

  • What do we do with a weather report once it's been retrieved? We leave it in the container in which we found it. Under no circumstances should we postpone the expiration. The whole point is that once weather information gets old, it ought to replaced regardless of whether someone has accessed it recently.

  • If you're not familiar with strategy objects, or it's been a while since you've seen the strategy pattern in action, I highly recommend Design Patterns: Elements of Reusable Object-Oriented Software (Addison Wesley, 1995). It's a great book.

  • What do we do with an object when it's been removed from the data structure? We don't do anything. Garbage collection handles getting rid of the instance.

  • How do we expire a weather report? We don't. If a user requests a weather report that isn't in the ExpirationSystem, we'll fetch it at the time it's requested. That is, the appropriate re-fetch is when the information is requested, not when the old information has expired.

Because these sets of answers are so different, our implementation of the Hashbelt data structure will have to be flexible and easily customized. We'll achieve the flexibility by using strategy objects to handle application-specific behavior.

Pages: 1, 2, 3, 4, 5

Next Pagearrow