ONJava.com    
 Published on ONJava.com (http://www.onjava.com/)
 See this if you're having trouble printing code examples


Memoization in Java Using Dynamic Proxy Classes

by Tom White
08/20/2003

Memoization is a technique borrowed from functional programming languages like Lisp, Python, and Perl for giving functions a memory of previously computed values. Memoizing a function adds a transparent caching wrapper to the function, so that function values that have already been calculated are returned from a cache rather than being recomputed each time. Memoization can provide significant performance gains for computing-intensive calls. It is also a reusable solution to adding caching to arbitrary routines.

This article looks at memoization in Java and presents a general Memoizer class that can be used like this:
Foo foo = (Foo) Memoizer.memoize(new FooImpl());

Here, Foo is the interface whose method calls are to be memoized, and FooImpl is the concrete implementation of Foo. The foo variable is an instance of the Foo interface; it behaves just like FooImpl, except that return values are cached. The beauty of the single Memoizer class is that adding memoization to any class is trivial: define an interface that comprises the methods to be memoized, then just call memoize with an implementation of the interface.

To understand how Memoizer is implemented, it is useful to build up to it in steps. First, I explore how caching can be built into the class that needs it. Then I examine how to add a caching wrapper to a specific class. Finally, I explain how a caching wrapper can be generalized to work for any class.

Adding Caching to a Computationally Intensive Routine

As an example of a computationally intensive routine, consider a class called PiBinaryDigitsCalculator that calculates the binary digits of pi. The only public method, calculateBinaryDigit, takes an integer that describes the binary digit required -- for example, -1000000 will return the one millionth digit after the decimal point -- and returns a byte value that is either 0 or 1. (For details of the novel algorithm used, consult the paper by the trio that discovered it.)

public class PiBinaryDigitsCalculator {
  /**
   * Returns the coefficient of 2^n in the binary
   * expansion of pi.
   * @param n the binary digit of pi to calculate.
   * @throws ValidityCheckFailedException if the validity
   * check fails, this means the implementation is buggy
   * or n is too large for sufficient precision to be
   * retained.
   */
  public byte calculateBinaryDigit(final int n) {
      return runBBPAlgorithm(n);
  }

  private byte runBBPAlgorithm(final int n) {
      // Lengthy routine goes here ...
  }

}

The simplest and most direct way to cache return values is to modify the class to have a Map instance variable to store previously computed values:

import java.util.HashMap;

public class PiBinaryDigitsCalculator {

  private HashMap cache = new HashMap();

  public synchronized byte calculateBinaryDigit(
  final int n) {

      final Integer N = new Integer(n);
      Byte B = (Byte) cache.get(N);
      if (B == null) {
          byte b = runBBPAlgorithm(n);
          cache.put(N, new Byte(b));
          return b;
      } else {
          return B.byteValue();
      }
  }

  private byte runBBPAlgorithm(final int n) {
      // Lengthy routine goes here ...
  }

}

Related Reading

Java Enterprise Best Practices
By The O'Reilly Java Authors

The calculateBinaryDigit method first checks a HashMap cache for a key, based on the input value (n), and simply returns the value if found. Otherwise, the lengthy calculation is performed and the result stored in the cache. There is also a little extra work in translating between primitive values and object wrappers to allow primitives to be placed in the map.

While this approach works, it has several drawbacks. First, there is no separation of concerns between caching code and algorithm code. One class is responsible for performing the calculation and maintaining the cached values. As a result, it is hard to test the code cleanly. For example, you cannot write a test that checks that the algorithm consistently returns the same value each time it is called, since from the second call onwards, the value is retrieved from the cache.

Second, it is hard to remove the caching code if it is not needed, since it is tightly intertwined with the code that performs the underlying calculation. Indeed, it is difficult to know if the caching is yielding significant performance improvements, as you cannot test the class with caching turned off. If you improve the algorithm, the caching might not be helping performance any longer -- but you would not know.

Third, the caching code cannot be reused. Although the code conforms to a common pattern, it is dedicated to the PiBinaryDigitsCalculator class.

The first two objections can be dealt with by introducing a caching wrapper.

A Caching Wrapper

By using the Decorator pattern, it is simple to separate the code that performs the calculation from the code that provides caching. First, define an interface that abstracts out the essential method call:

public interface BinaryDigitsCalculator {

  public byte calculateBinaryDigit(final int n);
}

Then create two implementations, one for each separate responsibility:

public class PiBinaryDigitsCalculator
  implements BinaryDigitsCalculator {

  public byte calculateBinaryDigit(final int n) {
      return runBBPAlgorithm(n);
  }

  private byte runBBPAlgorithm(final int n) {
      // Lengthy routine goes here ...
  }

}

import java.util.HashMap;

public class CachingBinaryDigitsCalculator implements
BinaryDigitsCalculator {

  private BinaryDigitsCalculator binaryDigitsCalculator;
  private HashMap cache = new HashMap();

  public CachingBinaryDigitsCalculator(
  BinaryDigitsCalculator calculator) {
      this.binaryDigitsCalculator = calculator;
  }

  public synchronized byte calculateBinaryDigit(int n) {
      final Integer N = new Integer(n);
      Byte B = (Byte) cache.get(N);
      if (B == null) {
          byte b =
            binaryDigitsCalculator.calculateBinaryDigit(n);
          cache.put(N, new Byte(b));
          return b;
      } else {
          return B.byteValue();
      }
  }
}

This listing is simply a refactored version of the previous implementation of PiBinaryDigitsCalculator. CachingBinaryDigitsCalculator wraps any instance of BinaryDigitsCalculator and adds caching for calls to the calculateBinaryDigit method. For a few more lines of code, the classes have improved in readability and maintainability. Since clients can now use the algorithm through the BinaryDigitsCalculator interface, it is easy to switch a caching implementation in or out as required.

In addition, proper tests for the caching code can be written. Imagine a mock implementation of BinaryDigitsCalculator that returns a different value each time the calculateBinaryDigit method is called with the same argument. By using this mock object, we can test that the caching wrapper is working, since it will return the same value each time. Such a test was impossible to write with the previous implementation that mixed caching and calculation.

Creating a General-Purpose Caching Wrapper Using Dynamic Proxy Classes

The only remaining disadvantage of using caching wrappers is their non-reusability. Every time I wish to add caching to a class, I need to write a specialized caching wrapper for the target interface. This can be slow and prone to error.

Version 1.3 of the Java™ 2 SDK, Standard Edition, introduced support for dynamic proxy classes: special classes that can choose which interfaces they implement when created at runtime. (Contrast this to normal Java classes, whose implemented interfaces are known -- and fixed -- at compile time.) By using this feature, it is possible to write a generic caching wrapper class, called Memoizer, that determines at runtime which interface it implements. There is no longer any need for the CachingBinaryDigitsCalculator wrapper; calls such as this one:

BinaryDigitsCalculator calculator =
  new CachingBinaryDigitsCalculator(
    new PiBinaryDigitsCalculator()
  );

can be rewritten to use Memoizer as follows:

BinaryDigitsCalculator calculator =
  (BinaryDigitsCalculator) Memoizer.memoize(
    new PiBinaryDigitsCalculator()
  );

The listing for Memoizer appears below:

import java.lang.reflect.InvocationHandler;
import java.lang.reflect.InvocationTargetException;
import java.lang.reflect.Method;
import java.lang.reflect.Proxy;
import java.util.Arrays;
import java.util.Collections;
import java.util.HashMap;
import java.util.List;
import java.util.Map;

public class Memoizer implements InvocationHandler {
  public static Object memoize(Object object) {
      return Proxy.newProxyInstance(
        object.getClass().getClassLoader(),
        object.getClass().getInterfaces(),
        new Memoizer(object)
      );
  }

  private Object object;
  private Map caches = new HashMap();

  private Memoizer(Object object) {
      this.object = object;
  }

  public Object invoke(Object proxy, Method method,
  Object[] args) throws Throwable {

      if (method.getReturnType().equals(Void.TYPE)) {
          // Don't cache void methods
          return invoke(method, args);
      } else {
          Map cache    = getCache(method);
          List key     = Arrays.asList(args);
          Object value = cache.get(key);

          if (value == null && !cache.containsKey(key)) {
              value = invoke(method, args);
              cache.put(key, value);
          }
          return value;
      }
  }

  private Object invoke(Method method, Object[] args)
  throws Throwable {
      try {
          return method.invoke(object, args);
      } catch (InvocationTargetException e) {
          throw e.getTargetException();
      }
  }

  private synchronized Map getCache(Method m) {
      Map cache = (Map) caches.get(m);
      if (cache == null) {
          cache = Collections.synchronizedMap(
            new HashMap()
          );
          caches.put(m, cache);
      }
      return cache;
  }
}

A call to the static method memoize with an object creates a new proxy instance -- that is, an instance of java.lang.reflect.Proxy -- that implements the same set of interfaces as the passed-in object. The set of interfaces is determined by object.getClass().getInterfaces(). Each proxy instance has an invocation handler (java.lang.reflect.InvocationHandler) that handles all method invocations that are made on the instance. In this case, Memoizer is the invocation handler.

When a method is invoked on the proxy instance (for example, calculateBinaryDigit), the invoke method is called on the Memoizer instance owned by the proxy instance. The invoke method is passed information that allows it to determine which method was called on the proxy instance, along with the array of arguments. In this example, the java.lang.Method object given to the Memoizer instance represents calculateBinaryDigit, and the array of Object arguments encapsulate the int parameter n (wrapped as an Integer). At this point, Memoizer can step in to do its caching.

The key used for the cache is essentially the method signature: a combination of the Method object and the array of Object arguments passed to the invoke method. For simplicity of implementation, Memoizer has a map of caches, one for each method, and then combines the array of Object arguments into a java.util.List object for the key to the relevant cache. It is convenient to use an implementation of java.util.List for the key, since it makes strong guarantees about the equals method, thus ensuring that method signatures match only if all of the corresponding pairs of arguments are equal.

Once the key has been created, the cache is checked for a previously computed value. If one is found, it is simply returned. Otherwise, the method has not been called with these arguments before, so it is invoked reflectively by calling the invoke method on java.lang.Method. In this example, the invocation will call calculateBinaryDigit on the concrete class PiBinaryDigitsCalculator. The return value is stored in Memoizer's cache before it is returned. The invocation handler has now finished its work.

When To Use Memoizer

As a general rule, Memoizer can be used whenever a more traditional cache -- such as those developed above -- is suitable. More specifically, each method on the interface to be memoized should meet all of the following criteria:

Clearly, it is futile to memoize methods that can change every time you invoke them. Equally, it is important not to memoize methods that have intentional side effects (methods that update state in some way), since the side effects will not be repeated for subsequent calls. The single exception to this advice is void methods. The implementation of Memoizer will always invoke a void method, since the only reason to invoke it is for its side effects.

It can be dangerous to memoize methods with mutable arguments for the same reason that it can be dangerous to store mutable classes in hash tables. As the doc comment for java.util.Map states, "The behavior of a map is not specified if the value of an object is changed in a manner that affects equals comparisons while the object is a key in the map." If you call a memoized method with an argument once, mutate the argument object, then call the method a second time, Memoizer might see this as a cache hit and not recompute the value. In short, the wrong value could be returned.

Performance

The whole point of using caching is to improve the speed of your application. However, reflection is notorious for its lack of performance. (The gap is closing, though: Bloch reports that in the Sun implementation of J2SE 1.4, reflective calls are typically only twice as slow as normal calls, compared to a forty-fold difference in J2SE 1.3.) Memoizer makes heavy use of reflection, so it might seem that it is a poor candidate for achieving performance gains. This conclusion is not necessarily valid; if the cost of calling the method to be memoized is high compared to the cost of a reflective method call (that is, the call to the invoke method on the InvocationHandler interface that Memoizer implements), then memoizing the method call can result in a speed up.

Running the PiBinaryDigitsCalculator example on a recent J2SE 1.4 JVM shows that the cost of a memoized method call is comparable to the cost of a normal call for very small values of n (for instance, -10). The overhead of the memoization is quickly swamped by the cost of performing the calculation for larger magnitudes of n. Therefore, clients making frequent use of PiBinaryDigitsCalculator could certainly benefit from a memoized version.

Unfortunately, the only way to tell if your code can benefit from memoization is to profile it. Nevertheless, it is very easy to apply a memoizing wrapper to existing code, and, importantly, it is just as easy to remove a wrapper. This suggests the following simple optimization strategy:

  1. Choose a class as a candidate for memoization.
  2. Profile your class (and the system as a whole).
  3. If the performance is acceptable, go to step 6; otherwise add a memoizing wrapper to the class.
  4. Profile your memoized class (and the system as a whole).
  5. If there is little or no significant performance gain, remove the memoizing wrapper.
  6. Repeat as necessary.

Ideally, you should analyze the impact of adding memoization to a class on the system as a whole, since only then can you be sure that it is worth adding. Some methods, even though they are computationally expensive, may not benefit from being memoized, simply because they are never called with the same arguments more than once. To gauge this, I am making available a more fully featured version of Memoizer that implements an interface called CacheStatistics that allows you to retrieve the number of cache hits and cache misses. You can use this interface during performance testing to measure whether memoization is "pulling its weight."

Extending Memoizer

It is straightforward to modify the Memoizer class to support different cache implementation strategies. A common type of cache is the Least-Recently-Used (LRU) cache that holds a bounded number of entries. The cache ensures that it does not exceed its maximum size by discarding the oldest entry, that is, the one that was least recently accessed. A class might choose to use a LRU cache to protect against memory filling up in an unbounded manner on long-running applications. To select the type of cache used, you simply pass an extra argument to the memoize method that specifies the CacheFactory implementation to use. Here the LRU cache can hold a maximum of 1000 entries:

BinaryDigitsCalculator calculator =
  (BinaryDigitsCalculator) Memoizer.memoize(
      new PiBinaryDigitsCalculator(),
      new LruCacheFactory(1000)
  );

Even in its simplest form, the Memoizer class is another useful tool that should be in every Java programmer's arsenal. But remember -- profile your code before and after using it.

Resources

Tom White has been an Apache Hadoop committer since February 2007, and is a member of the Apache Software Foundation. He has written numerous articles for O'Reilly, java.net and IBM's developerWorks, and has spoken at several conferences, including at ApacheCon 2008 on Hadoop.


Return to ONJava.com.

Copyright © 2009 O'Reilly Media, Inc.