Optimizing Hash Functions For a Perfect Map
Pages: 1, 2
Comparing speeds
What is the speed benefit of this perfect hashtable compared to
HashMap
? I'll test the two by comparing
PerfectHashMap
s to HashMap
s which have the
benefit of the same amount of memory, as well as measurements against
a HashMap
sized to just contain the set. I've optimized
the Key.hashCode()
and key.equals()
methods
so that the HashMap
s are not at a disadvantage from these
methods. And I've included a test against a KeyHashMap
implementation, a specialized implementation of HashMap
,
which is optimized for Key
object keys (see "Optimizing a
query on a Map" in the resource section below).
It is worth noting that the HashMap.put()
method
performs at different speeds depending on whether or not the
put()
is overwriting an existing entry. With no existing
entry for a key, the HashMap
needs to create a node
element before inserting the value into the collection, whereas the
node already exists when the entry is being overwritten. So I test
both these scenarios too. All measured times have been scaled by the
same value for simpler comparison.
Table 5: Comparing the performance of various Maps using the put() method on an empty table  
put(), no previous element at key

1.2 JVM 
1.3 JVM 
HotSpot 2.0 

1 
HashMap, size of key set 
100% 
49.5% 
57.6% 

2 
HashMap, same size as perfect hash map 
66.5% 
35.8% 
27.8% 

3 
KeyHashMap, size of key set 
73.3% 
46.1% 
44.0% 

4 
KeyHashMap, same size as perfect hash map 
42.4% 
34.2% 
26.6% 

5 
PerfectKeyHashMap 
15.1% 
15.7% 
12.2% 

6 
MinimalPerfectKeyHashMap (size of key set) 
32.2% 
35.0% 
27.7% 
Table 6: Comparing the performance of various Maps using the put() method on a full table  
put() overwriting previous element at key 
1.2 JVM 
1.3 JVM 
HotSpot 2.0 

1 
HashMap, size of key set 
17.4% 
21.9% 
27.7% 
2 
HashMap, same size as perfect hash map 
18.3% 
24.1% 
18.6% 
3 
KeyHashMap, size of key set 
13.4% 
17.0% 
15.4% 
4 
KeyHashMap, same size as perfect hash map 
14.6% 
16.1% 
15.5% 
5 
PerfectKeyHashMap 
15.15 
14.0% 
11.4% 
6 
MinimalPerfectKeyHashMap (size of key set) 
32.4% 
34.3% 
27.2% 
Table 7: Comparing the performance of various Maps using the get() method  
get() 
1.2 JVM 
1.3 JVM 
HotSpot 2.0 

1 
HashMap, size of key set 
16.6% 
25.1% 
20.6% 
2 
HashMap, same size as perfect hash map 
17.1% 
20.4% 
16.8% 
3 
KeyHashMap, size of key set 
13.4% 
11.7% 
12.6% 
4 
KeyHashMap, same size as perfect hash map 
13.2% 
15.1% 
11.7% 
5 
PerfectKeyHashMap 
7.3% 
8.7% 
6.6% 
6 
MinimalPerfectKeyHashMap (size of key set) 
16.0% 
18.2% 
14.9% 
The comparisons show that PerfectKeyHashMap
can be
significantly faster than any of the other implementations except in
the case where a put()
is overwriting an existing
entry. The Map
s based on the HashMap
function have a severely decreased worst case performance, which
occurs when many of the keys map to the same index (not tested here),
whereas the PerfectKeyHashMap
has constant performance
independent of the key set.
Reducing excessive memory requirements
Our hash function may require a large table in the worst case. A useful concept is the memory factor requirement of the perfect maps, which I define as the number of times larger the memory needs to be compared to the number of elements being stored in the maps. For the examples previously listed, the set {keys 1} using the hash function %7 produced a memory factor requirement of 1.75 (four elements requiring seven indexes); {keys 2} using %3 produced a memory factor requirement of 1; {keys 3} using %14 produced a memory factor requirement of 3.
I ran a test of thousands of randomly generated keysets of between 5 and 500 elements where each key value was retricted to the range 1 to 10,000. This test produced an average memory factor requirement of 12 and a maximum factor of 20. Whether these memory requirements are reasonable for your application depends on how many large perfect maps you need to create and on the importance of speed compared to memory. Can we gain the benefits of the perfect map without the large memory overhead?
For some datasets it may be possible to produce a perfect hash function that is fast and has a small memory factor requirement (for example the final map implementation in the article "Optimizing a query on a Map"). In general, however, reducing the memory overhead of hash functions usually requires the use of a table, thus reducing the speed of the hash function.
For the hash functions considered in this article, we have the
required tables available already; the perfect maps provide us with
tables that can produce minimal perfect maps. Since any minimal
perfect map requires a perfect map as well, you may think we have
increased the memory requirements rather than reduced
them. But multiple minimal perfect map instances for the same set of
keys can all point to the same perfect map for their memory map table;
therefore, depending on the application, this might be a feasible
solution. The optimized implementation of the
MinimalPerfectKeyHashMap
looks like
class MinimalPerfectKeyHashMap
{
Key[] keys;
int[] map;
Object[] values;
int hfRemainder;
MinimalPerfectKeyHashMap(int size, Key[] keys, int[] map,
int remainder)
{
this.keys = keys;
values = new Object[size];
this.map = map;
this.hfRemainder = remainder;
}
public void put(Key key, Object value)
{
int index = key.id % hfRemainder;
keys[index] = key;
values[map[index]] = value;
}
public Object get(Key key)
{
// int index = key.id % hfRemainder;
return values[map[key.id % hfRemainder]];
}
}
Examples of minimal perfect map usage can be found in the
associated source code with this article. However, the speed tests
applied to the minimal perfect map (the rows numbered 6, in tables 5
through 7) show that there may be no advantage to using this
implementation of MinimalPerfectKeyHashMap
s. The table
lookup produces enough of an overhead that an optimized
KeyHashMap
using the HashMap
function can
produce superior performance.
Further resources
 The source code is available here.
 The Java Performance Tuning website
 Optimizing a query on a Map by Jack Shirazi, JavaWorld November 2000
 Minimal Perfect Hashing by Carlo Pescio, Dr. Dobb's Journal, July 1996, p101
Jack Shirazi is the author of Java Performance Tuning. He was an early adopter of Java, and for the last few years has consulted mainly for the financial sector, focusing on Java performance.
Return to ONJava.com.