ONJava.com    
 Published on ONJava.com (http://www.onjava.com/)
 See this if you're having trouble printing code examples


Java Design and Performance Optimization

The Performance of Java's Lists

05/30/2001

The SDK has several implementations of the ordered collection interface java.util.List. The three best known are Vector, ArrayList, and LinkedList.

One question that frequently reaches the Java Performance Tuning site asks about the performance differences between these List classes. In this article I'll take a look at the performance differences between the LinkedList implementation and the Vector/ArrayList implementations.

To consider fully the performance ramifications of these classes, we need to know how they are implemented. So I'll start with brief descriptions of the most important implementation aspects from the point of view of performance.

The Vector and ArrayList implementations

Vector and ArrayList are both implemented with an underlying Object[] array that stores the elements. Accessing elements by index is a simple matter of accessing elements in the internal array by index:


public Object get(int index) {
  //check the index is valid first .. code not shown here
  return elementData[index];
}

The internal array can be bigger than the number of elements held by the Vector/ArrayList object: the difference is kept as extra capacity for efficiently adding elements. Adding elements is then very simply achieved by assigning the element to the first empty location in the internal array, and incrementing the index (size) for the new empty location:

public boolean add(Object o) {
  ensureCapacity(size + 1); //explained soon
  elementData[size++] = o;
  return true;  //List.add(Object) signature support
}

Inserting elements into the collection at any location other than the end is slightly more tricky: the array elements above the insertion point must all be moved up by one, then the assignment can occur:

public void add(int index, Object element) {
  //check the index is valid first .. code not shown here
  ensureCapacity(size+1); //explained soon
  System.arraycopy(elementData, index, elementData, index + 1, 
        size - index);
  elementData[index] = element;
  size++;
}

When the spare capacity is used up, the Vector/ArrayList object must replace its internal Object[] array with a new larger array when more elements need to be added, copying all the elements to the new array. The new array is 50% to 100% bigger than the old one depending on the SDK version (the code shown here makes it 100% bigger):

public void ensureCapacity(int minCapacity) {
  int oldCapacity = elementData.length;
  if (minCapacity > oldCapacity) {
    Object oldData[] = elementData;
    int newCapacity = Math.max(oldCapacity * 2, minCapacity);
    elementData = new Object[newCapacity];
    System.arraycopy(oldData, 0, elementData, 0, size);
  }
}

Also in Java Design and Performance Optimization:

Micro-Tuning Step-by-Step

Tuning JDBC: Measuring JDBC performance

Faster List Iteration with RandomAccess Interface

Multiprocess JVMs

Catching OutOfMemoryErrors to Preserve Monitoring and Server Processes

The main difference between the Vector class and the ArrayList class is the use of synchronization. Apart from two methods used only during serialization, none of the ArrayList methods are synchronized; in contrast most of the Vector methods are synchronized directly or indirectly. Consequently Vector is thread-safe, and ArrayList isn't. This makes ArrayList faster than Vector. For some of the latest JVMs the difference in speed between the two classes is negligible: strictly speaking, for these JVMs the difference in speed between the two classes is less than the variation in times obtained from tests comparing the performance of these classes.

The Vector and ArrayList implementations have excellent performance for indexed access and update of elements, since there is no overhead beyond range checking. Adding elements to, or deleting elements from, the end of the list also gives excellent performance except when the capacity is exhausted and the internal array has to be expanded. Inserting and deleting elements always require an array copy (two copies when the internal array must be grown first). The number of elements to be copied is proportional to [size-index], i.e. to the distance between the insertion/deletion index and the last index in the collection. For insertions, inserting to the front of the collection (index 0) gives the worst performance, and inserting at the end of the collection (after the last element) gives the best. The array copying overhead grows significantly as the size of the collection increases because the number of elements that need to be copied with each insertion increases.


The LinkedList implementation

LinkedList is implemented using a list of doubly-linked nodes. To access elements by index you need to traverse all the nodes until the indexed node is reached:

public Object get(int index) {
  //check the index is valid first .. code not shown here
  Entry e = header;  //starting node
  //go forwards or backwards depending on which is closer
  if (index < size/2) {
    for (int i = 0; i <= index; i++)
      e = e.next;
  } else {
    for (int i = size; i > index; i--)
      e = e.previous;
  }
  return e;
}

Inserting elements into the list is straightforward: traverse to the node at the index and insert a node immediately before that node:

public void add(int index, Object element) {
  //check the index is valid first .. code not shown here
  Entry e = header;  //starting node
  //go forwards or backwards depending on which is closer
  if (index < size/2) {
    for (int i = 0; i <= index; i++)
      e = e.next;
  } else {
    for (int i = size; i > index; i--)
      e = e.previous;
  }


  Entry newEntry = new Entry(element, e, e.previous);
  newEntry.previous.next = newEntry;
  newEntry.next.previous = newEntry;
  size++;
}

Thread-safe LinkedLists and other collections

If you need a thread-safe LinkedList from the Java SDK, you can obtain one using a synchronized wrapper from the Collections.synchronizedList(List) method. However, synchronized wrappers add a level of indirection which can have a high performance cost. A synchronized wrapped method can be up to two or three times slower than the unwrapped method; every method requires an extra method call as the wrapper redirects the call to the wrapped method. This indirection is less significant with complex methods, such as searches, but with simpler methods such as accessors and updaters the extra overhead can really slowdown performance.

This means that a synchronized wrapped LinkedList is at a significant performance disadvantage to Vector, as Vector does not require any extra indirection for thread-safety. If you require a thread-safe LinkedList, you will obtain a faster implementation if you copy the LinkedList class and make the required methods synchronized. This is also true of all the other collection classes: only List and Map have efficient thread-safe implementations (the Vector and Hashtable classes respectively). Amusingly these two efficient thread-safe classes exist purely because of backwards compatibility and not for performance considerations.

The LinkedList implementation has a performance overhead for indexed access and update of elements, since access to any index requires you to traverse multiple nodes. Adding elements to the collection suffers from the index traversal access performance drawback and, in addition, has a further overhead in requiring the creation of a node object. On the plus side, there is no further overhead to insertions and deletions, so insertions-deletion overhead is really mostly dependent on how far away the insertion-deletion index is from the ends of the collection.

Performance tests

There are many different functions of the classes that could be tested. LinkedLists are frequently used because of their supposed better performance for random index insertion and deletion, so I decided to focus on insertion performance, i.e. building collections. I've tested LinkedList against ArrayList since both are unsynchronized (see the "Thread-safe LinkedLists and other collections" sidebar).

The insertion speed is critically dependent on the size of the collection and the position where the element is to be inserted. All the best and worst performance cases arise when inserting either at one of the ends or at the exact middle point of the collection. Consequently, I've chosen three insertion locations (start, middle, end of the collection), and three representative collection sizes of medium (100 elements), large (10,000 elements), and very large (1,000,000 elements).

For my tests I used the Sun JVMs from the Java SDK 1.2.0 and 1.3.0 release. In addition I also tested with HotSpot JVM 2.0, the version available with the 1.3.0 SDK. All measured times in each table have been normalized to one of the SDK 1.2 VM tests (the cell showing 100% in each table). The default JVM configuraton with JIT enabled was used, though heap space needed to be increased for all JVMs to avoid out-of-memory errors. Times recorded were the averages over several test runs. In order to avoid garbage collection interference, I forced a full scavenge of all memory between each test (see the source code for details). The disk was monitored to ensure that disk paging did not occur (any test runs which showed significant paging were discarded). Each test that was too slow to give a multiple second response time was looped repeatedly until a significantly long enough time was recorded.


Table 1: Building a medium sized collection (100 elements). Figures in brackets are for pre-sized collections

 

1.2 JVM

1.3 JVM

HotSpot 2.0 JVM

Always inserting at the start of the ArrayList

100% (48.0%)

184.9% (152.0%)

108.0% (66.7%)

Always inserting at the start of the LinkedList

135.5%

109.1%

85.3%

Always inserting at the mid-point of the ArrayList

130.0% (40.6%)

187.4% (158.0%)

84.7% (46.0%)

Always inserting at the mid-point of the LinkedList

174.0%

135.0%

102.3%

Always inserting at the end of the ArrayList

63.3% (20.7%)

65.9% (25.0%)

60.3% (29.3%)

Always inserting at the end of the LinkedList

106.7%

86.3%

80.3%

For short collections, ArrayList and LinkedList are pretty close in performance. ArrayLists have the edge when inserting to the end of the collection, i.e. appending elements. But then appending elements is the single operation that ArrayList is most optimized for: if you just want a statically sized collection, a Java array (e.g. Object[]) gives better performance than any collection object. Beyond the append operation, measured timings are very mixed and reflect various JVM optimizations more than anything else.

For example, for insertions to the start of the collection (first two rows of table 1), the HotSpot 2.0 JVM with LinkedLists giving the fastest time (85.3%), and the second fastest time is the 1.2 JVM with ArrayList (100%). These two results illustrate that the simple JIT compiler in 1.2 is very efficient at doing simple things like iterating over and copying arrays. The more sophisticated JVM with optimizing compiler in HotSpot is able to improve more complex activities such as object creation (of LinkedList nodes) and take advantage of code-inlining. The 1.3 JVM results seem to indicate a severe underperformance in simple operations which is likely to be improved in later JVM versions.

What I have only partially measured here are other advantages that ArrayList has over LinkedList, namely the ability to pre-size collections, and the reduced garbage collection overhead of ArrayLists. Specifically, ArrayLists can be created with a particular size (e.g. in this test the ArrayList could be created with a capacity of 100 elements), thus avoiding all the growth overheads. The figures in brackets in table 1 show the improvements possible with pre-sizing collections. LinkedLists (up to SDK 1.3) cannot be pre-sized.

Additionally, the ArrayList generates only a few extra objects for garbage collection, i.e. the internal array object which holds the elements, and one extra internal array object each time the ArrayList capacity is exhausted and the ArrayList needs to be grown. The LinkedList generates one node object for every insertion, irrespective of any deletions that might take place. Consequently, LinkedLists can make considerably more work for the garbage collector. Taking these added factors into account, my inclination would be to use an ArrayList rather than a LinkedList for any small to medium sized collection.

Table 2: Building a large collection (10 000 elements).

 

1.2 JVM

1.3 JVM

HotSpot 2.0 JVM

Always inserting at the start of the ArrayList

7773%

7537%

7500%

Always inserting at the start of the LinkedList

100%

90.34%

65.6%

Always inserting at the mid-point of the ArrayList

3318%

3412%

3121%

Always inserting at the mid-point of the LinkedList

26264%

14315%

14209%

Always inserting at the end of the ArrayList

41.4%

41.2%

37.5%

Always inserting at the end of the LinkedList

66.4%

73.9%

61.7%

Moving into the range of larger sized collections, we can see from table 2 that we begin to get a severe performance penalty when we encounter large insertion overheads. The worst case for LinkedList is, as we predicted from the analysis of the implementation, inserting to the middle of the collection. We can also see that this has worse performance than the ArrayList worst case of insertion to the start of the collection. Insertion to the middle of the ArrayList has significantly better performance than those two worst cases.

Overall, ArrayList again gives better performance for most cases, including index insertion to random locations. If you always need to insert toward the beginning of the collection, LinkedList provides a better performance but you can achieve even better performance using a reversed ArrayList, i.e. either with a dedicated implementation or by flipping indexes using a [size-index] mapping.

Table 3: Building a very large collection (1 000 000 elements).

 

1.2 JVM

1.3 JVM

HotSpot 2.0 JVM

Always inserting at the start of the ArrayList

too long

too long

too long

Always inserting at the start of the LinkedList

100%

179.5%

144.1%

Always inserting at the mid-point of the ArrayList

too long

too long

too long

Always inserting at the mid-point of the LinkedList

too long

too long

too long

Always inserting at the end of the ArrayList

38.3%

47.7%

42.9%

Always inserting at the end of the LinkedList

65.1%

161.5%

139.9%

Table 3, showing the results for very large collections, indicates conclusions very similar to those of table 2. However table 3 serves to emphasize that very large collections need particularly close matches between data, collection types, and data manipulation algorithms. Otherwise you can end up with performance that is essentially unusable. For optimum performance you should build a dedicated collection class implementation specific to the problem. For very large collections, it is often necessary to do this just in order to obtain adequate performance.

Querying performance

Querying is most efficiently achieved by implementing the query inside the class (see the two articles about optimizing queries, listed in the resources). The time needed to iterate over all the elements is the limiting factor for queries on these lists. A query implemented in the ArrayList/Vector classes will iterate over the array elements. The following example counts the number of null elements:

int count = 0;
for (int i = 0; i < size; i++)
  if(elementData[i] == null)
    count++;

A query implemented in the LinkedList class will traverse all the nodes. The following example counts the number of null elements:

node = header.next;
count = 0;
for (int i = 0; i < repeat; i++, node = node.next)
  if (node.element == null)
    count++;

Table 4 shows the ArrayList providing significantly superior performance to that of the LinkedList, once again indicating that ArrayList is the recommended class to use. Table 5 shows the time taken to iterate over all the elements using a ListIterator object obtained from the List.listIterator(int) method. These iterators would be necessary if the query could not be implemented in the List class. Once again, ArrayList shows superior performance, though not as dramatically as with table 4. Note that the absolute times in table 5 are about ten times longer than the absolute times in table 4, i.e. ArrayList internal traversal is about ten times faster than ArrayList iteration using a ListIterator.

ble 4: Iterating through all the elements of the collection using internal access

 

1.2 JVM

1.3 JVM

HotSpot 2.0 JVM

ArrayList internal traversal

100%

106%

197%

LinkedList internal traversal

470%

493%

448%


Table 5: Iterating through all the elements of the collection using a ListIterator

 

1.2 JVM

1.3 JVM

HotSpot 2.0 JVM

ArrayList iteration using ListIterator

100%

118%

75.2%

LinkedList iteration using ListIterator

117%

186%

156%

Conclusions

Related Reading

Java Performance TuningJava Performance Tuning
By Jack Shirazi
Table of Contents
Index
Sample Chapter
Full Description
Read Online -- Safari

The measurements and the other factors we've considered clearly indicate that ArrayLists and Vectors usually provides better performance than LinkedLists and synchronized wrapped LinkedLists. Even in cases where you might have thought that the LinkedList would provide better performance, you may be able to coax superior performance from ArrayList by altering how elements are added, for example by reversing the collection order.

There are situations where LinkedLists will provide better performance; with, for example, very large collections where many elements need to be added to both the beginning and end of the collection. But in general, I recommend using ArrayList/Vector as the default and using LinkedList only where there is an identified performance problem in which a LinkedList improves the performance.

Further resources

Jack Shirazi is the author of Java Performance Tuning. He was an early adopter of Java, and for the last few years has consulted mainly for the financial sector, focusing on Java performance.


Return to ONJava.com.

Copyright © 2009 O'Reilly Media, Inc.