Pages: 1, 2
Java hardware accelerators, or Java chips, are the ultimate option for speeding up code execution. They're emerging in two fundamental configurations. Chips of the first type, such as Chicory Systems' HotShot and Nazomi Communications' JSTAR, operate as Java coprocessors in conjunction with a general-purpose microprocessor, in much the same way that graphics accelerators are used. Java chips in the other category, like Patriot Scientific's PSC1000 and aJile's aJ-100, replace the general-purpose CPU.
Clearly, the latter are limited to applications that can be written entirely in Java. As for the first type, adding components of course raises costs, so this type offers a viable option only when the cost is acceptable. Indeed, the price of Java chips has been high because of relatively low production volumes. A high-volume solution, however, may be forthcoming in the form of the ARM940 general processor with an integrated Java accelerator, called Jazelle.
The Prechelt study determined that the average memory requirement of a program written in Java is two to three times greater than for one written in C/C++. Even the compact nature of bytecode, usually about 50 percent smaller than compiled C/C++ machine code, can't offset that overhead. Recognizing that trying to drop Java in its original, desktop-oriented form into embedded systems won't work, Sun Microsystems, Java's originator, took the language through several evolutionary steps in an effort to tailor it to the embedded environment. Today, the Java 2 Platform, Micro Edition (J2ME), represents the latest, most evolved, and slimmest version of Java for the embedded space.
You can trim J2ME by eliminating classes and code components that aren't needed for your application. The JVM, native libraries, core classes, and application bytecode go into ROM. JVMs for embedded applications generally run under 500 kB, whereas class libraries for J2ME typically don't exceed 1.5 MB. Java components that affect RAM requirements include the JVM (for bytecode execution), the potential dynamic compiler, the Java heap, and the number of threads (the latter two obviously depend on the application). Executing as much of the application as possible using an interpreter -- while maintaining acceptable execution performance -- helps contain the memory footprint.
Share some of the techniques that you use in designing embedded Java systems.
Selecting a highly scalable operating system and C run-time package allows you to tune these software components for optimal memory efficiency. Scaling the Java environment can be complex, however. Usually, a two-stage process is involved. First, you can use the command line verbose option, java -v, to see the classes an application uses and then manually extract the needed libraries and classes. If this process doesn't save sufficient space, you can use filtering tools, like JavaFilter from Sun's EmbeddedJava platform.
If you're using Java, you should expect to increase memory and CPU resources compared with using C/C++ (see Table 2).
Choosing the Right Java Platform
Of course, your choice of JVM is one key to optimizing Java performance for your application. Obviously, you need a JVM designed for embedded applications.
Embedded JVMs are highly user-configurable to match different embedded system requirements, but which embedded JVM should you use? Java benchmarks are meant to help you evaluate JVMs and Java performance, but you need to be careful about which ones you use and about the conclusions you draw from them. A good benchmark score for a particular JVM doesn't necessarily mean that using it will make your application go faster.
Consequently, before evaluating a JVM, you have to evaluate any benchmark to determine how meaningful it may be to your application, taking into account the whole Java environment that's associated with it. Some benchmarks are very application-specific (a chat server benchmark like VolanoMark, for instance) and may not apply to the kind of Java applications you're developing. Additionally, because JVM vendors commonly optimize their products to achieve good benchmark scores, the scores can be misleading about how much a given JVM will improve the performance of your particular application. Conversely, if your application has specific problems in certain areas, an environment that's optimized to improve general processing won't solve those specific processing problems.
Measuring Application Performance
When considering a benchmark to determine the overall performance of a Java application, bear in mind that bytecode execution, native code execution, and graphics each play a role. Their impact varies depending on the nature of the specific application: what the application does, how much of it is bytecode versus native code, and how much use it makes of graphics. How well a JVM will perform for a given application depends on how the unique mix of these three functional areas maps onto its capabilities. Given these variables, the best way to benchmark a JVM is against your own application. Since that's not possible before the application has been written, you must find those benchmarks that are most relevant to the application you intend to write.
Sorting through Java benchmarks to find the ones that are relevant for embedded applications can be confusing. SpecJVM98, for example, provides a relatively complete set of benchmarks that test diverse aspects of the JVM. Sounds good -- but Spec-JVM-98 runs in a client/server environment and requires a minimum of 48 MB of RAM on the client side for the JVM. That excludes it from any relevance to most embedded applications. In addition, it can't be used with precompiled classes.
Other benchmarks have different pitfalls. VolanoMark, for example, is a chat server implementation and is therefore relevant only for benchmarking applications with the same set of requirements as chat servers. The JMark benchmark assumes that the application includes the applet viewer and a full implementation of Java's Abstract Windowing Toolkit (AWT). This benchmark can be irrelevant for the many embedded applications that have no graphics or have limited graphics that don't require full AWT support, such as devices running a PersonalJava minimal-AWT implementation.
Embedded CaffeineMark (ECM), the embedded version of the CaffeineMark benchmark from Pendragon Software (it has no graphics tests), is easy to run on any embedded JVM, since it requires support for basic Java core classes only, and it doesn't require a large amount of memory. More importantly, there's a high correlation between good scores on this benchmark and improved bytecode performance in embedded applications.
To get the most meaningful results from ECM, you must use exactly the same hardware when testing different JVMs. You must also pay attention to implementation differences among the JVMs you're testing. If, for example, you're comparing a JVM with a JIT compiler against a JVM without one, it's important to run the JVM that has the JIT with the java
-nojit option on the command line to ensure an apples-to-apples comparison.
ECM will typically make any JVM using compilation look good, no matter the type of compilation, because it includes a very small set of classes and always repeats the same small set of instructions. Dynamic compilers just cache the complete translation of the Java code in RAM and execute next iterations of the tests in native code. Ahead-of-time compilers can easily optimize the loops and algorithms used in ECM, too.
Although the industry abounds with other Java benchmarks, like Java Grande, SciMark, jBYTEmark, Dhrystone benchmark in Java, and UCSD Benchmarks for Java, there is no "ultimate" benchmark that can give you certainty about Java and JVM performance in embedded applications. The best strategy is to identify a suite of benchmarks that seem most relevant to your application and use the combined results of those benchmarks to help predict Java performance in a particular system environment.
Furthermore, the existing benchmarks may not measure other aspects of your application code. Tuning Java applications to meet performance goals may require addressing many program functions besides bytecode execution. Some of those functions -- for example, thread management, synchronization, method-to-method calls, class resolution, object allocation and heap management (including garbage collection), calls to native methods, bytecode verification, and exception handling -- occur within the JVM. Because few if any benchmarks address such functions, it falls to you to conduct an in-depth study of a JVM's internals to understand how its design may affect crucial aspects of your application. Writing special programs that exercise critical aspects of a JVM can help you evaluate it for the application. If, for example, your application uses a heavy mix of Java and C code, you can benefit by writing a program that tests native method call performance. Other functions, including native code execution and such factors as network latency, may occur outside the JVM.
What if your application includes graphics? To start, there are two major factors that affect graphics performance in Java applications: Does the application's graphics display driver use graphics coprocessor hardware acceleration? Is the application configured with a lightweight (faster) or a heavyweight (slower) implementation of the Abstract Windowing Toolkit? (See the figure.) In addition, like any other high-level Java service, graphics performance is affected by the way that the graphics services integrate with lower-level native libraries.
Wind River's Personal JWorks includes a good benchmark for evaluating graphics performance in embedded systems. The benchmark targets the PersonalJava AWT with a set of 39 tests of images, buttons, scrolling, text, and basic 2-D graphics.
Finally, you need to consider the performance of your CPU. To help you identify CPU-bound performance, you should supplement simple benchmarks by running real-world applications that exercise large amounts of different, complex Java code. Such test code must meet a number of requirements: It should contain a large number of classes that reflect an estimate of the real application (20-plus is a good ballpark). It must also be large (thousands of lines, at least) and have no file system access and no graphics. Some existing programs meet all those criteria. The GNU regular expression package, regexp, for example, comprises about 3,000 lines of code and more than 21 classes, providing a large number of expressions to parse and match. Another program, the Bean Shell interpreter, is a simple prime number sieve that has 70 classes and several thousand lines of code. JavaCodeCompact, Sun's PersonalJava ROMizing tool, also would make a good test program.
The result of running these programs as test cases illustrates the wide variance in the meaning of benchmark scores. For example, a JVM using a JIT compiler may run Embedded CaffeineMark up to 30 times faster than when the nojit option is turned on (thus running in pure interpretation mode), but the same JVM runs the Bean Shell and regexp tests only about one and a half times faster when using the JIT compiler. (The apparently impressive thirtyfold speedup on a simple benchmark like Embedded CaffeineMark is achieved through caching techniques that the compiler uses on the small amount of code and classes in ECM.) The difference in results clearly demonstrates that high benchmark scores may not translate into a commensurate level of performance improvement in real-world applications.
Actually, SpecJVM98 and JMark yield results that most closely approximate those for real-world applications. They do suffer, though, from the limitations discussed above. In particular, the usefulness of the former in the embedded space depends greatly on your ability to overcome the problems associated with your test infrastructure requirements.
Vincent Perrier is Product Manager, Java Platforms, for the Wind River Platforms Business Unit in Alameda, and is a contributing editor with the Embedded Developers Journal.
Return to ONJava.com.