ONJava.com    
 Published on ONJava.com (http://www.onjava.com/)
 See this if you're having trouble printing code examples


Java Design and Performance Optimization

Micro-Tuning Step-by-Step

by Jack Shirazi, author of Java Performance Tuning
03/20/2002

Micro-tuning is a term often used to mean speeding up small sections of code out of context, by profiling and analyzing that code and using some of the many techniques available to make code run faster.

In contrast, macro-tuning looks at the program in context, and tries to improve performance by altering the algorithms, data structures, or interactions between components or subsystems. There is not always a clear-cut distinction between these tuning methodologies; micro-tuning is often a part of macro-tuning and the alteration of algorithms and data structures is also a normal part of micro-tuning. In this article, I will step through micro-tuning a short method, using some standard techniques.

The Problem

A colleague of mine recently reminded me about an old discussion in one of the performance forums. The originator of the discussion was talking about a comparison between two programs with the same functionality, one written in C and one in Java. The full program was never given, but one function was provided:


public boolean checkInteger(String testInteger)
{
  try
  {
    Integer theInteger = new Integer(testInteger);//fails if not  a number
    return
      (theInteger.toString() != "") && //not empty
      (theInteger.intValue() > 10) && //greater than ten
      ((theInteger.intValue() >= 2) && 
           (theInteger.intValue() <= 100000)) && //2>=X<=100000
      (theInteger.toString().charAt(0) == '3'); //first digit is 3               
  }
  catch (NumberFormatException err)
  {
    return false;
  }     
}

This function checks that a given string:

  1. Is an integer. (If it is not, the NumberFormatException exception gets thrown and caught, and the catch block returns false.)
  2. Is not the empty string.
  3. Is an integer greater than 10.
  4. Is an integer between 2 and 100,000, inclusive.
  5. Has a first digit of 3.

It looks like a silly test, and the implementation is clearly a naive port into Java. Nevertheless, my colleague wondered just how much it could be optimized. Optimizing this test is an interesting exercise. Anyone who has done even a small amount of performance tuning in Java will immediately see several huge optimizations they could apply. However, I'm going to proceed as if I can't immediately see any potential optimizations, and work out what I can do to speed up the method. Those of you who like a challenge might like to, before proceeding with the article, write down the optimizations you expect will get applied, and see if you are proved right by the end of the article.

The actual optimizations applied are quite dependent on the data. The data for the test was never discussed, so I'll use several data sets, to give you an idea of how performance is dependent on data. I'll use one data set with numbers that would only return true from the method, a second data set with numbers that would return true and false equally often, and a third data set that includes non-number elements.

Where to Start

Micro-tuning should always start with a baseline measurement. The most basic tool in the tuner's armory is the System.currentTimeMillis() method, which returns the current time in milliseconds, as a long data item. This method returns the time offset from some starting time, but we don't actually care what that starting time is because we will use the currentTimeMillis() method to measure a time difference, the elapsed time between the start and end of our test:

long time = System.currentTimeMillis();
  checkInteger("34567");
  time = System.currentTimeMillis() - time;
  System.out.println(
    "The time taken to execute the checkInteger method was "
     + time + " milliseconds");

For more on this topic, see the author's Web site, JavaPerformanceTuning.com.

When executed, this measurement says that the method took zero milliseconds. The method runs fast enough that the resolution of the timer we have available is not fine enough to give a reliable or useful measurement. In practice, the method must have been previously identified as a bottleneck in the application (presumably because it is called very often), otherwise there would be no further need to tune the method.

For our micro-tuning exercise to proceed, we will need to repeat the test a sufficient number of times to make it measurable. It is important that the repeated measurement is representative of how the method is used in the application, i.e., that the data used to repeatedly test the method represents the data used in the application. Many methods execute at different speeds depending on the data passed to them, and our method is one such method. Obviously, that is not clear from the first measurement above. But we can see that, for example, the method first converts a string into an integer and the time taken for that task will inevitably depend on the number of characters in the string. Any general-purpose conversion algorithm should be faster converting string "1" into the integer 1, than converting the string "789123456" into the integer 789123456, because of the number of characters that need to be iterated over. Our method also has other speed variations that depend on the data; for example, different integers will cause different numbers of tests to be performed by the method.

Java Performance Tuning

Related Reading

Java Performance Tuning
By Jack Shirazi

To give a concrete example, if we repeat the test a number of times using the string "1" and again using the string "30298," the latter test takes twice as long as the former.

time = System.currentTimeMillis();
  for (int j = 0; j < REPEAT; j++)
    checkInteger("1"); //comment out one
//    checkInteger("30298"); //or the other
  time = System.currentTimeMillis() - time;
  System.out.println("checkInteger() " + REPEAT_1 + 
         " times: " + time);

As we discussed earlier, I'll use three data sets to obtain our baseline:

When taking measurements, I like to repeat test runs three times and take average results. I usually make the repeat factor in the individual looped tests high enough so that I'm measuring at least a few seconds. I also make sure that there are no other significant processes running on the computer, and that I leave the test in the foreground the whole time it is running. These steps minimize the possibility that CPU time will be allocated to anything other than the test while it is running, making the test results more consistent. In addition, I personally prefer to use normalized results: I divide all result measurements by the first result and multiply by 100 to give a percentage result. Naturally, this means that the first result is always 100%. My baseline results are shown in Table 1 (all tests were run with SDK 1.4.0 JVM in server mode).

Table 1: Baseline measurements

  Dataset 1 Dataset 2 Dataset 3
baseline 100% 84.1% 540.0%

As well as giving us a baseline, these measurements immediately produce something interesting: the Dataset 3 result is much larger than the other two results. We'll follow up on this point shortly. Baseline measurements normally show some useful information, if only to provide the average execution time for one call to the method. That average, multiplied by the number of iterations you expect to execute when running the application for real, gives you an idea of how expensive the method currently is. For example, if a million iterations take one second, and you expect 100,000 calls to the method every second at peak application usage, then this one method already takes one tenth of the time available to execute the method and all other methods. This may provide sufficient performance, or it may be a serious bottleneck, depending on the application.

After the Baseline

We now have three routes we can follow in proceeding with the micro-tuning:

These three routes are generic; micro-tuning generally has these options, though any one option may become unavailable when the code becomes more efficient. Usually, some of the options are easier to proceed with than others, but for this article it is instructive to look at all three.

Dataset 3

The baseline measurement shows that using Dataset 3 required a much longer time to execute than using the other two datasets. The major difference in this dataset is the existence of strings that do not hold valid numbers. We can easily follow the logic for this case: the method tries to create an Integer object by parsing the string using the Integer constructor. In cases where the string is not a number, the constructor throws a NumberFormatException. This is caught in our catch block, and false is returned. So it seems that this procedure is extremely expensive.

In fact, it is fairly well known that creating an exception is an expensive procedure. The actual cost comes from filling out the stack trace that the exception holds, and even though we never care about the stack trace in this method, we cannot avoid its cost. We can, however, avoid the exception completely by checking that we have a valid number before creating the Integer. This is simply done by checking all of the characters to make sure they are all digits:

for (int i = 0; i < testInteger.length(); i++)
  if (!Character.isDigit(testInteger.charAt(i)))
    return false;

Note that negative numbers will return false using this loop, but that is acceptable, since the method in any case always returns false when passed a negative number. Re-running our test with this code added to the beginning of the method produces the results listed in Table 2.

Table 2: After adding the looped digit test

  Dataset 1 Dataset 2 Dataset 3
Baseline 100% 84.1% 540.0%
With looped digit test 114.8% 66.1% 51.4%

So we've improved the test using Dataset 3 to be faster than the tests using both the other datasets, in line with what we expected. The test using Dataset 2 is also improved, which I didn't expect.

The Algorithm

Let's look at the algorithm for any inefficiencies. The algorithm has one obvious redundancy: we check that the integer is greater than 10 and, if it is, we subsequently check that it is greater than or equal to 2.

(theInteger.intValue() > 10) && ((theInteger.intValue() >= 2)

Clearly, we can dispense with the second test completely. If the first (>10) test fails, then the following (>=2) will not be executed, since we are using the shortcut Boolean AND operator (&&), which only executes its right-hand side if the left-hand side is true. If the "greater than 10" test succeeds, then we know definitely that the "greater than or equal to 2" test will succeed. Are there further inefficiencies? Let's go line-by-line. For convenience, here's the full function again.

1 public boolean checkInteger(String testInteger)
2 {
3   try
4   {
5     Integer theInteger = new Integer(testInteger);//fails if not  a number
6     return
7       (theInteger.toString() != "") && //not empty
8       (theInteger.intValue() > 10) && //greater than ten
9       ((theInteger.intValue() >= 2) && 
10           (theInteger.intValue() <= 100000)) && //2>=X<=100000
11     (theInteger.toString().charAt(0) == '3'); //first digit is 3               
12  }
13  catch (NumberFormatException err)
14  {
15    return false;
16  }     
17}

First we create an Integer (5). Then we test for an empty string (7). But on checking the Integer class, an empty string is an invalid number for the Integer constructor, so if we do have an empty string, this test will never be executed. If an empty string is passed to the method, the Integer constructor will throw an exception and the catch block will be immediately activated. So we should either eliminate the empty string test, or move it before the Integer creation. It's highly likely that a simple empty string test is faster than the Integer creation procedure, so we should move the test to before the Integer creation. (Note that the test was, in any case, incorrect in using the identity operator != rather than the equals() method, and I'll correct that here.)

Moving on, the "greater than 10" test (8) is necessary, and we've already eliminated the "greater than or equal to 2" test (9). The "less than or equal to 100000" (10) is necessary, and the test for the first digit being 3 (11) is necessary.

If the first digit is 3, however, the smallest possible valid number is 30 (since the number must be greater than 10); thus the "greater than 10" test should become a "greater than 29" test. Similarly, the largest number possible up to 100000, that starts with a 3, is 39999, so the "less than or equal to 100000" test should become "less than 40000."

Finally, the test for the first digit being 3 is surely simpler to execute than creating the Integer. To create the Integer, the minimum that would need to be executed would be to check that every character is a digit. So it makes sense to move the "first digit being 3" test to before the creation of the Integer.

public boolean checkInteger(String testInteger)
{
  try
  {
    if (testInteger.equals("")) return false;
    if (testInteger.charAt(0) != '3') return false;
    Integer theInteger = new Integer(testInteger);//fails if not  a number
    return
      (theInteger.intValue() > 29) &&
      (theInteger.intValue() <= 40000);
  }
  catch (NumberFormatException err)
  {
    return false;
  }     
}

Re-running our test using this code produces the results listed in Table 3.

Table 3: After restructuring the algorithm

  Dataset 1 Dataset 2 Dataset 3
Baseline 100% 84.1% 540.0%
Restructured algorithm 46.6% 39.9% 29.8%

Profiling

There are a large number of commercial, shareware, and open source profilers available, including several that come with the Sun SDK (with version 1.4 I've seen -prof, -Xprof, -Xrunhprof, -Xaprof, and -Xhprof). A list of profilers is available.

I won't cover how to use any particular profiler in this article. Each profiler has its own documentation, and the Sun profilers are documented in my book, Java Performance Tuning. Ideally, you should use a profiler that can take a snapshot of the test loop itself, and not include all of the setup for the test in the profile. If you use a profiler that can't take snapshots, (e.g., the SDK profilers), you should ensure that the test loop takes significantly longer than the time to set up the test (e.g., loading the data sets), so that the profile is dominated by the test loop.

Applying a profiler to the original test shows that for Dataset 1 and Dataset 2, the largest amount of time taken by the test -- more than a third of the time -- is in creating strings in the Integer.toString() method. For Dataset 3, the largest amount of time is spent in creating exceptions, taking some two-thirds of the runtime.

In fact, the toString() calls are completely redundant since we have the integer in string format already available as the argument to the method, and we can avoid creating the exception by testing for non-digits in the string (as explained in the earlier "Dataset 3" section). Making these changes gives us the following method:

public boolean checkInteger(String testInteger)
{
  for (int i = 0; i < testInteger.length(); i++)
    if (!Character.isDigit(testInteger.charAt(i)))
      return false;
  try
  {
    Integer theInteger = new Integer(testInteger);//fails if not  a number
    return
      (testInteger != "") && //not empty
      (theInteger.intValue() > 10) && //greater than ten
      ((theInteger.intValue() >= 2) && 
           (theInteger.intValue() <= 100000)) && //2>=X<=100000
      (testInteger.charAt(0) == '3'); //first digit is 3               
  }
  catch (NumberFormatException err)
  {
    return false;
  }     
}

Re-running our test using this code produces the results listed in Table 4.

Table 4: After eliminating toString() calls

  Dataset 1 Dataset 2 Dataset 3
Baseline 100% 84.1% 540.0%
Eliminating toString() calls 51.9% 40.7% 34.7%

Comparing Table 4 with the measurements from the restructured algorithm in Table 3, we can see that most of the speedup from restructuring the algorithm actually came from eliminating the toString() calls. That is one of the advantages of profiling: the actual bottlenecks are determined, rather than having to guess at possible speedups.

Round Two

Tuning is an iterative process. You normally find one bottleneck, make changes that improve performance, test those changes, and then start again. In the last section, we independently applied three different methodologies for eliminating bottlenecks from the application. The second methodology, analyzing the algorithm, produced the best result. This is quite typical. Algorithm changes usually provide the best speedup. Unfortunately, it can be quite difficult to analyze an algorithm and determine what approach provides the same functionality and is generically faster, i.e., faster in every applicable case. Algorithm analysis can provide the best speedup, but is the most difficult tuning methodology to apply.

The first methodology we tried, examining the code for the causes of the differences in speed between two variations of test runs (which in our case was caused by different input parameters), is not as difficult as full algorithm analysis. (Note that this methodology is not restricted to analyzing speed with different data sets; you can also use it to analyze small variations in code that produce different speeds). This methodology is very useful in reducing unexpected speed variations in applications. The difficulty comes more in developing the alternative tests than in analyzing the results for the differences. Had I not tested for non-numeric strings as parameters, there would have been no useful analysis possible.

The last methodology we tried, profiling, is the easiest to use. Profiling is the "brute force" methodology. You can always turn to it in the absence of any alternative, and it almost always provides something that can be speeded up. The law of diminishing returns kicks in after a while, however, leaving you with bottlenecks that are not worth speeding up, because the potential speedup is too small for the effort required.

Our three methodologies in the last section gave us three different changes. The last, the profile-suggested changes, were completely subsumed by the changes obtained using the other two methodologies. The method resulting from algorithm analysis is almost complete on its own, as it handles most non-numeric strings efficiently. Unfortunately, it doesn't handle cases where a non-numeric string like "3xyz" is passed as a parameter, so we also need the looped digit test. (I didn't include any non-numeric strings starting with "3" in the test data, so the effect did not show up in the tests). Combining the two improvements efficiently requires that I move the first two tests before the looped digit test and, since we've now already tested the first character, the subsequent loop only needs to start at the second character of the string:

public boolean checkInteger(String testInteger)
{
  if (testInteger.equals("")) return false;
  if (testInteger.charAt(0) != '3') return false;
  for (int i = 1; i < testInteger.length(); i++)
    if (!Character.isDigit(testInteger.charAt(i)))
      return false;
  try
  {
    Integer theInteger = new Integer(testInteger);//fails if not  a number
    return
      (theInteger.intValue() > 29) &&
      (theInteger.intValue() <= 40000);
  }
  catch (NumberFormatException err)
  {
    return false;
  }     
}

Re-running our test using this code produces the results in Table 5, which summarizes all measurements to this point.

Table 5: Combining the first round tunes

  Dataset 1 Dataset 2 Dataset 3
Baseline 100% 84.1% 540.0%
With looped digit test 114.8% 66.1% 51.4%
Restructured algorithm 46.6% 39.9% 29.8%
Eliminating toString() calls 51.9% 40.7% 34.7%
Restructured algorithm with looped digit test 54.2% 41.6% 33.2%

Table 5 seems to show that we are now worse off than simply using the code that eliminated the toString() calls! But that code didn't correctly handle the empty string case, which would throw an exception. Moving the empty string case to the top of the method and using the equals() method takes us back to preferring our most recent code.

Profiling Again

Now on with the tuning. Algorithm analysis is no longer useful, as we've already fully analyzed the algorithm, and the variations between the tests using different data sets can be explained by the different number of Boolean tests that are executed, depending on the different types of string parameters passed to the method. So we proceed with profiling. Profiling the latest version of the method indicates that the next bottleneck is the Integer creation, specifically in parsing the string to create the numeric value. For our method, the parse is actually very simple, and we may be incurring the overkill cost of using the more generic algorithm that Integer must have. In fact, we are already parsing the string once, to make sure that the characters are all digits. It is quite a small step to change the looped digit test to add in the extra code that extracts the integer value:

  //The character for digit 0 is (char) 48, etc.
  int val = testInteger.charAt(0) - 48;
  for (int i = 1; i < testInteger.length(); i++)
    if (Character.isDigit(testInteger.charAt(i)))
      val = (val*10) + testInteger.charAt(i) - 48;
    else
      return false;

And we will obviously change the Integer creation method to use the int value rather than the String. The test measurements produce the best results so far, as shown in Table 6.

Table 6: Adding number parsing

  Dataset 1 Dataset 2 Dataset 3
Baseline 100% 84.1% 540.0%
With looped digit test 114.8% 66.1% 51.4%
Restructured algorithm 46.6% 39.9% 29.8%
Eliminating toString() calls 51.9% 40.7% 34.7%
Restructured algorithm with looped digit test 54.2% 41.6% 33.2%
As last, with number parsing 36.2% 28.6% 23.0%

Round Three and More

The next profile, using the latest version of the method, shows that Character.isDigit() takes the most significant time, followed by String.equals() and garbage collection. The latest JVM (1.4.0) has possibly the most efficient garbage collection mechanisms of the JVMs released so far. Earlier JVM versions would have probably indicated garbage collection as being a bottleneck much earlier in tuning. My inclination would be to leave Character.isDigit() alone, since it should already be a simple test in the Character class, and it is a static method, so if it can be inlined efficiently the JIT compiler should manage it.

The String.equals() method is overkill to test for an empty string. It is quicker to test if the length of the string is 0. And the garbage objects being reclaimed are all of those Integer objects we are generating each time through the method. We don't even need the Integer objects now, since we are parsing the string directly and have the int value. So we can change our method to:

public boolean checkInteger(String testInteger)
{
  if (testInteger.length() == 0) return false;
  if (testInteger.charAt(0) != '3') return false;
  //The character for digit 0 is (char) 48, etc.
  int val = testInteger.charAt(0) - 48;
  for (int i = 1; i < testInteger.length(); i++)
    if (Character.isDigit(testInteger.charAt(i)))
      val = (val*10) + testInteger.charAt(i) - 48;
    else
      return false;
  return
      (val > 29) &&
      (val <= 40000);
}

Table 7: The final version

  Dataset 1 Dataset 2 Dataset 3
Baseline 100% 84.1% 540.0%
With looped digit test 114.8% 66.1% 51.4%
Restructured algorithm 46.6% 39.9% 29.8%
Eliminating toString() calls 51.9% 40.7% 34.7%
Restructured algorithm with looped digit test 54.2% 41.6% 33.2%
As last, with number parsing 36.2% 28.6% 23.0%
Final version 26.5% 19.6% 16.6%

The measured times for all of the tests are listed in Table 7. Hands up: who predicted these or equivalent optimizations in the challenge at the beginning of the article? Well done if you did; I didn't. The equals() method turning up as a bottleneck came as a complete surprise to me.

When to Stop Tuning

Related Reading

Java Performance Tuning
By Jack Shirazi

I've labeled Table 7 as "the final version." Why am I stopping now? Why don't I carry on profiling, looking for more speedups? No reason at all. In fact I should carry on, because I have set no target speed to reach. This is crucial to all forms of tuning: with no target, tuning can carry on for much longer than is needed. I could carry on until the profile shows nothing except the checkInteger() method itself. Then I could re-analyze the code, try different datasets, search for literature which may suggest further speedups, analyze the bytecode to see if it is the most efficient, analyze the generated native machine code on several platforms to see if it could be improved, look at why the method is being called to possibly eliminate the method completely. (Now that would terminate the tuning!). Always set a target before starting tuning, so that you know when to stop.

The Procedure

The procedure I've followed in this article is one that you can copy for any micro-tuning exercise. The difficulties often come more from trying to create reliable measurements and useful datasets, and in interpreting profile results, than in the step-by-step flow of micro-tuning. In macro-tuning, there is an extra step in each round, to determine which method or area of the application is the current bottleneck. Macro-tuning can lead you to focus on a different method in each round of tuning, rather than repeatedly attacking one method. To help you with your own tuning, here is a summary of the steps suggested by this article.

  1. Identify the bottleneck: Identify a method that needs to be speeded up.
  2. Set a performance target: Decide how much performance gain you need the method to achieve. With macro-tuning, it usually makes more sense to decide on an acceptable execution time for the task that uses the method.
  3. Use representative data: Identify a dataset that is representative of how the method is used within the application. The dataset should cover all of the data needed by the method.
  4. Measure the baseline: Take one or more baseline measurements as a reference. You may need to execute the method repeatedly to obtain acceptable measurements. With macro-tuning, the task that executes the method may need to be started repeatedly.
  5. Analyze the method: Use one of the following analysis methodologies to determine potential speedups for the method: profiling, algorithm analysis, and analysis of the causes of any variations in measurements. If in doubt, use profiling.
  6. Test the change: Test that the change determined by step 5 provides a speedup. If it doesn't, repeat step 5. Otherwise retain the change.
  7. Repeat: Repeat from step 5 until the target speedup is achieved. For macro-tuning, repeat from step 1 until the target performance is achieved.

Jack Shirazi is the author of Java Performance Tuning. He was an early adopter of Java, and for the last few years has consulted mainly for the financial sector, focusing on Java performance.


Read more Java Design and Performance Optimization columns.

Return to ONJava.com.

Copyright © 2009 O'Reilly Media, Inc.