ONJava.com -- The Independent Source for Enterprise Java
oreilly.comSafari Books Online.Conferences.

advertisement

AddThis Social Bookmark Button

The Hashbelt Data Structure
Pages: 1, 2, 3, 4, 5

The ExpirationHandler Interface

After the background thread rotates the containers, it still has a reference to the just-expired container (the container that has just been removed from the _containers array). Before going to sleep, the background thread needs to expire all of the objects in the just-expired container. It does this by using a strategy object that implements the ExpirationHandler interface. The idea is that the code that expires objects is almost always application-specific; the task is best left to code written by the application programmers.

The ExpirationHandler interface is very short: it contains a single method. Here's the entire interface:

  package grosso.expiration.hashbelt;
  import grosso.expiration.*;

  import java.util.*;

  public interface ExpirationHandler<K,V> {
    public void handleExpiredContainer(HashbeltContainer<K,V> expiredContainer);
  }

In spite of the simplicity of this interface, there are three implementations of ExpirationHandler in the downloadable source code: NullExpirationHandler, SimpleExpirationHandler, and ReinsertingExpirationHandler. NullExpirationHandler is by far the simplest implementation: it does absolutely nothing. Strangely enough, this is often the correct strategy, because the expiration system doesn't have any references to either the container or the objects inside the container. As long as objects outside the expiration system aren't holding any references to the expired objects, using NullExpirationHandler is a convenient way to let the garbage collector perform cleanup.

SimpleExpirationHandler is an abstract class that loops through all of the objects in the container and calls the timeExpired method for each one. Application programmers can subclass this object and override this method to perform expiration. Here's the code for SimpleExpirationHandler:

  package grosso.expiration.hashbelt.handlers;
  import grosso.expiration.hashbelt.*;
  import grosso.expiration.*;

  import java.util.*;

  /*
    SimpleExpirationHandler.

    Does the obvious. Calls timeExpired for each object in the container. At the end,
    the objects in the container have been expired and there's no record of
    them in the AbstractHashbeltExpirationSystem.
  */

  public abstract class SimpleExpirationHandler<K,V>implements ExpirationHandler<K,V> {
    public void handleExpiredContainer(HashbeltContainer<K,V> expiredContainer) {
      Iterator<V> i = expiredContainer.getValues();
      while (i.hasNext()) {
        V nextValue = i.next();
        timeExpired(nextValue);
      }
    }

    protected abstract void timeExpired(V object);
  }

The final expiration handler, ReinsertingExpirationHandler, is an abstract class that loops through all of the objects in the container, calls the timeExpired method for each one, and then reinserts the expired objects back into the front of the hashbelt.

This might seem a little strange at first glance: this entire series has been about expiring objects, and it's not clear why an already expired object would be reinserted into the system. There are cases where this behavior can be useful. For example, consider the much-loved "stock ticker" example. Stock tickers require that prices be pushed to clients periodically. The hashbelt framework can be used to perform this task quite efficiently for large numbers of clients. You simply define a class that encapsulates how to push information to a person and create a subclass of ReinsertingExpirationHandler, which pushes the information to the client. Because ReinsertingExpirationHandler puts the instance back into the hashbelt, the client will have information pushed to them on a regular basis.

Suppose, for example, you use five containers with a rotation time of 10 seconds each. Then you have the following:

  • From the server's point of view. Every 10 seconds, a bucket is rotated. One container is expired, which means that approximately 20% of the clients are going to receive updated information (and then get reinserted into the hashbelt).
  • From the client's point of view. Updated information is pushed out every 50 seconds or so.

The nice part about this is that the only code the application programmer has to write is the code that's application-specific (the actual remote method calls, and the code that obtains the stock market data).

Here's the code for ReinsertingExpirationHandler:

  package grosso.expiration.hashbelt.handlers;
  import grosso.expiration.hashbelt.*;
  import grosso.expiration.*;

  import java.util.*;
  /*
    ReinsertingExpirationHandler.

    Calls timeExpired for the object and then reinserts it in the front of the
    expiration system.

    Useful for alerts and announcements. E.g. suppose you're supposed to
    send someone an update every 15 minutes. Use this one and an object
    that sends the message inside its "expire" method.
  */

  public abstract class ReinsertingExpirationHandler<K,V>implements ExpirationHandler<K,V> {
    private AbstractHashbeltExpirationSystem<K,V> _owner;
    public ReinsertingExpirationHandler(AbstractHashbeltExpirationSystem<K,V> owner) {
      _owner = owner;
    }

    public void handleExpiredContainer(HashbeltContainer<K,V> expiredContainer) {
      Iterator<K> i = expiredContainer.getKeys();
      while (i.hasNext()) {
        K key = i.next();
        V nextValue = expiredContainer.get(key);
        timeExpired(nextValue);
        if (null!=_owner) {
          _owner.put(key, nextValue);
        }
      }
    }

    protected abstract void timeExpired(V object) ;
  }

There is one further interesting behavioral difference between SimpleExpirationHandler and ReinsertingExpirationHandler. SimpleExpirationHandler iterates using the values stored in the container while ReinsertingExpirationHandler iterates using the keys. All of the implementations of HashbeltContainer discussed below are more efficient using values rather than using keys.

Warning: Don't get confused by the superficial similarity between UpdatingHashbeltExpirationSystem and ReinsertingExpirationHandler. UpdatingHashbeltExpirationSystem extends the lifetime (e.g., the time until expiration) of objects that have been recently accessed. ReinsertingExpirationHandler reinserts objects into the system after they've been expired. These are very different behaviors.

The HashbeltContainer Interface

The final piece of the puzzle is the HashbeltContainer interface. Just like the ExpirationSystem interface, the HashbeltContainer interface is designed to look a lot like java.util.Map. It uses keyed access, has get and put operations, and can be iterated over, either using keys or values.

However, implementations of HashbeltContainer don't need to be quite as general as implementations of java.util.Map. We know more about how HashbeltContainer will be used. We know that get, remove, and put are mostly used while the container is active (e.g., before the entire container is removed from the _containers array). And we know that getValues and getKeys are only used, if they are used at all, by the expiration handler during expiration. This information helps application programmers choose the right implementation of HashbeltContainer.

Here's the HashbeltContainer interface:

  package grosso.expiration.hashbelt;
  import grosso.expiration.*;

  import java.util.*;

  /*
    HashbeltContainer
  
    Holds some of the objects for a Hashbelt. Conceptually, this is
    "one bucket on the conveyor belt."
  
    Note: implementations of this interface need to be highly synchronized-- the
    implementations of the hashbelt rely on this object to be threadsafe (at
    least in the get/remove/put block. getIterator is only called once the
    container is dead and is called from within a single thread).
  */

  public interface HashbeltContainer<K,V>{
    public V get(K key);
    public V remove(K key);
    public void put(K key, V value);
    public void clear();
    public Iterator<V> getValues();
    public Iterator<K> getKeys();
    public int size();
  }

The simplest version of HashbeltContainer, StandardHashbeltContainer, is implemented as a wrapper around an instance of Hashmap. Here's the code for StandardHashbeltContainer:

  package grosso.expiration.hashbelt.containers;
  import grosso.expiration.hashbelt.*;
  import grosso.expiration.*;

  import java.util.*;

  /*
      

    A basic implementation of the HashbeltContainer interface. It
    uses a Hashmap to store key/value pairs. This is optimized for
    lots of gets with a fair number of removes. It there aren't going
    to be many removes, it's better to use FastIteratingHashbeltContainer
    (which also has a vector of instances to speed up iteration).
  */

  public class StandardHashbeltContainer<K,V>implements HashbeltContainer<K,V>{
    private HashMap<K,V> _keysToExpirableObjects = new HashMap<K,V>();

    public void clear() {
      _keysToExpirableObjects.clear();
    }

    public synchronized V get (K key) {
      return _keysToExpirableObjects.get(key);
    }

    public synchronized V remove (K key) {
      return _keysToExpirableObjects.remove (key);
    }

    public synchronized void put(K key, V value) {
      _keysToExpirableObjects.put (key, value);
    }

    public Iterator<V> getValues() {
      ArrayList<V> values = new ArrayList<V>( _keysToExpirableObjects.values());
      return values.iterator();
    }
  
    public Iterator<K> getKeys() {
      ArrayList<K> keys = new ArrayList<K>( _keysToExpirableObjects.keySet());
      return keys.iterator();
    }

    public int size(){
      return _keysToExpirableObjects.size();
    }
  }

In many cases, StandardHashbeltContainer is exactly the right data structure to use. For example, recall our discussion of weather reports from Half Moon Bay. We said:

  • What do we do with a weather report once it's been retrieved? We leave it in the container in which we found it. Under no circumstances should we postpone the expiration. The whole point is that once weather information gets old, it ought to replaced, regardless of whether someone has accessed it recently.
  • What do we do with an object when it's been removed from the data structure? We don't do anything. Garbage collection handles getting rid of the instance.
  • How do we expire a weather report? We don't. If a user requests a weather report that isn't in the ExpirationSystem, we'll fetch it at the time it's requested. That is, the appropriate re-fetch is when the information is requested, not when the old information expired.

This is, essentially, the statement that we will insert objects into the container and then fetch them using keys. But we will never remove them from the containers, and we will never need to iterate over the container during expiration (since expiration really just requires the objects to be garbage-collected). In which case, we ought to be using a container that is optimized for insertions and retrievals, but that doesn't handle iteration very well at all. The best container for this purpose is StandardHashbeltContainer.

Consider, on the other hand, our answers for session keys:

  • What do we do with a session key once it's been retrieved? We remove it from the container in which we found it, and stick it in the container with the most recent timestamp (this effectively postpones the expiration of the session).
  • What do we do with an object when it's been removed from the data structure? We actively expire it.
  • How do we expire a session? We remove any resources associated to the session that may have been allocated. In addition, we probably record the end of the session in some persistent storage mechanism.

These answers indicate that we'll be doing a lot more removes (as we move sessions keys between containers) and that we'll also be doing some iterating during expiration (since we actually do need to expire each object). One possible solution is to use a combined hashmap and linked list, as in HashlistBasedHashbeltContainer. This class uses hashing for fast insertion and removal, and linked lists for fast iteration.

  package grosso.expiration.hashbelt.containers;
  import grosso.expiration.hashbelt.*;
  import grosso.expiration.*;
  import grosso.collections.*;

  import java.util.*;

  /*
    HashlistBasedHashbeltContainer

    A slight variation on the standard hashbelt container. This maintains
    three data structures:

      a hashmap from keys to values
      a hashmap from keys to nodes
      a linked list (of nodes)

    The idea is that this lets us have fast updates and removals, but also
    fairly fast iterations at the end.

    We can't use the java.util linked list class for this because,
    unfortunately, they don't expose the node class (and the get
    methods are insanely painful).

  */

  public class HashlistBasedHashbeltContainer<K,V>implements HashbeltContainer<K,V>{
    private  HashMap<K,V> _keysToExpirableObjects = new HashMap<K,V>();
    private  HashMap<K,Node<V>> _keysToNodes = new HashMap<K,Node<V>>();
    private DoublyLinkedList<V> _expirableObjects = new DoublyLinkedList<V>();

    public void clear() {
      _keysToExpirableObjects.clear();
    }

    public synchronized V get (K key) {
      return _keysToExpirableObjects.get(key);
    }

    public synchronized V remove (K key) {
      V returnValue = _keysToExpirableObjects.remove (key);
      if (null == returnValue) {
        return null;
      }
      Node<V> node = _keysToNodes.get(key);
      node.remove();
      return returnValue;
    }

    public synchronized void put(K key, V value) {
      _keysToExpirableObjects.put (key, value);
      Node<V> node = _expirableObjects.add(value);
      _keysToNodes.put(key, node);
    }

    public Iterator<V> getValues() {
      return _expirableObjects.iterator();
    }

    public Iterator<K> getKeys() {
      ArrayList<K> keys = new ArrayList<K>( _keysToExpirableObjects.keySet());
      return keys.iterator();
    }

    public int size(){
      return _keysToExpirableObjects.size();
    }
  }

Note: I first ran across the idea of using a hashlist in an IBM white paper on Java performance. While the paper I read appears to have vanished from the face of the earth, a similar paper is available here.

I've also included a third container implementation in the source code for this article. FastIteratingHashbeltContainer is essentially the same as HashlistBasedHashbeltContainer. The only difference is that it uses an instance of java.util.Vector instead of a linked list. This yields faster iterations, and makes insertions a little faster than with HashlistBasedHashbeltContainer. On the other hand, because removing an object from the middle of a vector involves performing an array copy on the remaining references, removing an object is now much more expensive than with HashlistBasedHashbeltContainer.

Related Reading

Java RMIJava RMI
By William Grosso
Table of Contents
Index
Sample Chapter
Full Description

You should also note that all three containers are explicitly in-memory structures. If you need your containers to use a persistent storage medium (such as a file or an external Javaspace), you will have to write your own container.

Summary

After reading the first article in this series, you should have a good understanding of data expiration, of when data expiration is necessary, and what problems can arise when you attempt to expire data in a scalable and robust way. This article covered the basics of the hashbelt algorithm and explained a particular implementation in detail; you should now feel comfortable with the source code for this article and feel comfortable with its basic design.

In the final article in this series, we'll take the examples from the first article and the hashbelt implementation from this one, and show how to combine them. In particular, we'll discuss how to implement session keys using hashbelts. We'll also reimplement RemoteStubCache from my earlier series on command objects in distributed computing.

William Grosso is a coauthor of Java Enterprise Best Practices.

Related Articles by William Grosso

Introducing Automatic Data Expiration

Learning Command Objects and RMI

Seamlessly Caching Stubs for Improved Performance

Generics and Method Objects


Return to ONJava.com.