ONJava.com    
 Published on ONJava.com (http://www.onjava.com/)
 See this if you're having trouble printing code examples


Java Design and Performance Optimization

Multiprocess JVMs

09/25/2001

Combining multiple Java processes into a single Java virtual machine (JVM) is one way to reduce the JVM memory overhead. This article discusses the techniques required to achieve a multiprocess JVM.

One frequent complaint about JVMs is their large memory size. JVMs typically have a several-megabyte overhead. This is acceptable if your application will itself require tens of megabytes or more, but can be annoying when you have a small application to run which should not require much in the way of memory. If you have many small applications written in Java that need to be running concurrently, such as several small monitoring or service processes, then the JVM memory overhead can rapidly add up to deplete system memory.

Several vendors of specialized JVMs, such as application servers providers, have addressed this issue by creating multiprocess JVMs. These JVMs appear to run several Java processes within one system process. There is also a (free) pure Java library for multiprocessing JVMs, Echidna. If your interest is purely in using multiprocess JVMs, without actually needing to know the technology behind them, you need read no more of this article; instead, go to the Echidna site, download the library and documentation, and make your JVM multiprocess.

The benefits of running a multiprocess JVM can be seen by running several small applications. For example, in one test, I ran four small Java applications. The multiprocess JVM required a total of four megabytes of system memory to run all four applications simultaneously. Running each application in a separate JVM required at least three megabytes per JVM, totalling over twelve megabytes of system memory. This example illustrates the advantages gained from combining small applications in one JVM when the JVM overhead is significant compared to the application memory requirements. In addition, as the JVM is already running when you start a Java process, the application startup time can be hugely reduced. Note, however, that combining applications with larger memory requirements will not gain such a large memory advantage.

Starting the main() class

To start creating our multiprocess library, the first thing the JVM needs to do is start up the class it will run. Typically, you start a class using the java executable:

java class [args...]

For the class to run, it must have a main(String[]) method defined, and the JVM will use that as the entry point to the application.

From Java code, you can start a class in the same way as the JVM does, using the reflection package (java.lang.reflect) to identify and start the class' main(String[]) method. The following method obtains a class object for the given named class, finds the main(String[]) method of that class, and starts running that main(String[]) method.

public static void startAnotherClass(
       String classname, String[] args)
throws Exception
{
  //Get the class
  Class classObject = Class.forName(classname);

  //Find the main(String[]) method of that class.
  //main has one parameter, a String array. Set
  //that argument type
  Class[] mainParamType = {args.getClass()};
  //Search for the main(String[]) method
  Method main = classObject.getMethod("main", mainParamType);
  //Create an object array of the arguments to pass
  //to the method
  Object[] mainParams = {args};

  //start the real class.
  main.invoke(null, mainParams);
}

Starting another class in its own thread

The last section showed how to start a class, but on its own doesn't give us a multiprocess JVM. We need to start the class in a separate thread, so that each time we start a class we do not block our own multiprocess starter. Once again, this is straightforward: starting the class in its own thread merely requires the creation of a new thread before calling the previously defined startAnotherClass() method. First I'll define the method that starts the separate thread:

public static Process startAnotherClassInItsOwnThread(
           String classname, String[] args)
{
  Process process = new Process(classname, args);
  Thread thread = new Thread(process);
  thread.start();
  return process;
}

I'm using an extra class called Process, which is used to hold the information needed to start the class, i.e. the classname and command line arguments (the argument list passed to the main(String[]) method). Process defines a run() method, which calls the class startup method defined in the last section. The Process class is defined as follows:

class Process {
  String theClassName;
  String[] theStartupArguments;

  public Process(String classname, String[] args)
  {
    theClassName = classname;
    theStartupArguments = args;
  }

  public void run()
  {
    startAnotherClass(theClassName, theStartupArguments);
  }
}

What about stopping processes?

We've quickly reached the stage where we can start up multiple pseudo-processes in a JVM, but what happens when you have a runaway process that you want to stop? Killing the JVM process will kill all the processes running in it, which is not ideal, so we need to have a way to kill only one specific pseudo-process. The Thread.stop() method gives us this capability. We already have a Process object, so it is straightforward to add a method to kill it: we add an instance variable to hold its thread, and a terminate() method to kill that thread. The Process class now looks as follows (new lines of code emphasized):

class Process {
  String theClassName;
  String[] theStartupArguments;
  Thread mainThread;

  public Process(String classname, String[] args)
  {
    theClassName = classname;
    theStartupArguments = args;
  }

  public void run()
  {
    mainThread = Thread.currentThread();
    startAnotherClass(theClassName, theStartupArguments);
  }

  public void terminate()
  {
    if (mainThread != null)
      mainThread.stop();
  }

}

But Thread.stop() is deprecated. Why is this, and does it affect our usage of it? Well, several methods in the Thread class are deprecated, because of potential corruption to shared data if you use those methods. Specifically, suppose you have a synchronized method which is updating multiple shared variables:

synchronized assumedAtomicUpdate()
{
  updateSomeVariables();
  anotherVariable += 42;
  yetAnotherVariable ++;
}

Comment on this articleWhat do you think of this approach?
Post your comments

Also in Java Design and Performance Optimization:

Micro-Tuning Step-by-Step

Tuning JDBC: Measuring JDBC performance

Faster List Iteration with RandomAccess Interface

Normally, this method will update all the variables in an atomic way with respect to other threads using the same monitor, i.e. every thread using the same monitor will see either all the variables updated, or none of them updated. But Thread.stop() can violate this atomicity, because the thread can be terminated right in the middle of the method, while some of the variables have a new value and others still have the old value. This can lead to a corrupt state for the application. In our case, we can assume that the pseudo-process is a separate set of classes which should not be sharing any state with other applications, so using Thread.stop() to stop the pseudo-process is relatively safe. And, in any case, there is no other reliable way to terminate pseudo-processes.

What about terminating processes which spawn threads?

Related Reading

Java Performance TuningJava Performance Tuning
By Jack Shirazi
Table of Contents
Index
Sample Chapter
Full Description
Read Online -- Safari

The last section showed us how to terminate our pseudo-process. But it applied only to a single-threaded process. If a pseudo-process spawns threads, the terminate() method will only terminate the main thread, leaving other rogue threads still running. Fortunately, Java provides a simple way to handle all the threads the application will spawn: the ThreadGroup class. When a Thread is created, it is automatically a member of a ThreadGroup, and new ThreadGroups are normally created as members of the ThreadGroup holding the current thread. So if we create a new ThreadGroup for the pseudo-process' startup thread, all of the threads spawned by the pseudo-process will be in that ThreadGroup. Calling ThreadGroup.stop() will stop all the threads in that ThreadGroup, and all subgroups. The changes needed are relatively simple. We start the class with its own ThreadGroup (new lines of code emphasized):

public static Process startAnotherClassInItsOwnThread(
             String classname, String[] args)
  {
    Process process = new Process(classname, args);
    ThreadGroup threadgroup = new ThreadGroup("main");
    Thread thread = new Thread(threadgroup, process);
    thread.start();
    return process;
  }

We also need to terminate the ThreadGroup rather than our starting thread, so the Process class now looks as follows (new lines of code emphasized):

class Process {
  String theClassName;
  String[] theStartupArguments;
  ThreadGroup mainThread;

  public Process(String classname, String[] args)
  {
    theClassName = classname;
    theStartupArguments = args;
  }

  public void run()
  {
    mainThread = Thread.currentThread().getThreadGroup();
    startAnotherClass(theClassName, theStartupArguments);
  }

  public void terminate()
  {
    if (mainThread != null)
      mainThread.stop();
  }
}

There are two unlikely but possible problems with our new ability to terminate a multithreaded pseudo-process. Firstly, it may be possible for the pseudo-process to access ThreadGroups that do not belong it, thus allowing the creation of threads that would not be terminated. Currently, I believe this could only be achieved from a custom SecurityManager. In any case, if it is possible, it would be very difficult to achieve, and the vast majority of applications can be reliably assumed to contain only threads that will be stopped when the main thread's ThreadGroup is stopped.

The second problem is associated with the deprecation of Thread.stop(). Now that we are considering multiple threads, it is possible that while terminating the threads, some of the threads will be terminated, leaving the pseudo-process application with a corrupt state just before the remaining threads are terminated. This is unlikely, and even if it did occur, we can assume that it doesn't matter if the application, in its last few milliseconds prior to forced termination, is in a corrupt state. In fact, the application is probably already in an unexpected state, since we are being forced to terminate it.

What if an application calls System.exit()?

We still have one big hole in our multiprocess library. If any application calls System.exit(), the JVM terminates, and all the pseudo-processes will be destroyed with no warning. Fortunately, Java's design once again comes to our aid. Any call to System.exit() is first checked by the SecurityManager to see if the application has permission to terminate the JVM. We can install our own SecurityManager to catch the System.exit() call, disallow it, and terminate the pseudo-process instead. The SecurityManager is actually quite simple to define:

class ExitCatchingSecurityManager extends SecurityManager
{
  public void checkExit(int status)
  {
    Process.terminateProcessWithThreadGroup(getThreadGroup());
    throw new SecurityException();
  }
}

In addition, the SecurityManager should define all other checks so that they do not block pseudo-processes from running. A simple null call for all check* methods will work. We install our own SecurityManager by calling System.setSecurityManager(), i.e., by adding the following line near the startup of the multiprocess library:

System.setSecurityManager(new ExitCatchingSecurityManager());

The Process.terminateProcessWithThreadGroup() method is simple to define, by holding a collection of Process objects in the Process class, searching the collection to find the Process with the identical ThreadGroup, then terminating that Process.

Classpaths, class names, and class versions

Our multiprocess library now seems to provide everything needed to handle multiple Java processes in one JVM. However, JVMs are usually started with a particular classpath, which gives access to all classes. Are we restricted to having only one classpath for our multiprocess JVM? If so, all the classes will have to be in that classpath, and any name clashes from different applications will cause huge problems, as will new versions of a class created after an older version has already been loaded.

This particular problem has been discussed many times for server and development environments. The solution is to dedicate a separate ClassLoader for each pseudo-process. Having a dedicated ClassLoader avoids any name clashes and allows classpaths to be defined at pseudo-process startup. Classes are identified in a JVM by their full name, and also by the ClassLoaders that loaded them. Classes loaded by different ClassLoader instances are separate classes, even if they have the same name. In fact, the same class file loaded by two different ClassLoaders can result in two different classes in the JVM.

We need to make the following changes to our multiprocess library in order to support dedicated per process ClassLoaders. First we need to define a ClassLoader. Then we need to change the code that looks for the class to use our custom ClassLoader. The latter change is quite simple; we go back to the startAnotherClass() method, the very first method defined in this article, and change the line calling Class.forName():

Class classObject = Class.forName(classname);

To use the ClassLoader.loadClass() method, instead:

ClassLoader myLoader = new MyClassLoader(...)
Class classObject = myLoader.loadClass(classname, true);

The custom ClassLoader could be more challenging to implement. But in fact, we don't have to work hard to define a proprietary ClassLoader, since Java 2 comes with several useful ClassLoaders. In particular, the java.net.URLClassLoader allows for a complete specification of the classpath. Converting a standard classpath to a list of URLs is easier than defining a new ClassLoader that supports file systems and jar files. The following is a simple implementation of the MyClassLoader class. For simplicity, I've restricted this ClassLoader to allow only one extra directory to be added to the classpath:

class MyClassLoader extends URLClassLoader
{
  public MyClassLoader (File additionalClassPath)
  {
    super(new URL[0]);
    URL url = null;   
    try{
      url = new URL("file:///" +
             additionalClassPath.getAbsolutePath()+"/");
    }catch(Exception e){e.printStackTrace();}
    addURL(url);
  }

  //Change loadClass to public access
  public Class loadClass(String name, boolean resolve)
    throws ClassNotFoundException
  {
    return super.loadClass(name, resolve);
  }
}

Note that there are now two types of classes loaded by the multiprocess library:

Having a distinction between shared and unshared classes requires some management, but has the advantage that you can avoid loading the same class multiple times for each process, thus saving more memory. If the management overhead is an issue, you can restrict the shared classes to the SDK classes, and have all application classes unshared.

Echidna

When I first started researching this article, I hadn't encountered Echidna. So I was hugely interested to find this free, open source library which already covered all the issues presented in this article, and many more. Other issues considered by Echidna include:

Further Resources

Jack Shirazi is the author of Java Performance Tuning. He was an early adopter of Java, and for the last few years has consulted mainly for the financial sector, focusing on Java performance.


Read more Java Design and Performance Optimization columns.

Return to ONJava.com.

Copyright © 2009 O'Reilly Media, Inc.