Published on ONJava.com (http://www.onjava.com/)
 See this if you're having trouble printing code examples

Using Jini to Build a Catastrophe-Resistant System

by Dave Sag

The events of September 11 were a wake-up call to software developers the world over. Of the many large firms that held critical data or even entire data centers in the World Trade Center towers, few had timely offsite backups in place. One firm even did its offsite backup from Tower 1 into Tower 2, reasoning -- like the people who decided to only insure one tower -- that the chance of both towers vanishing was slim in the extreme. Who would have guessed such a thing could ever happen?

Once the shock of the event had passed, the implications for the work we at Pronoic Ltd. had been doing became evident. As I will go on to discuss, we had been very focussed on using Jini to build ultra-high avaliability applications for the insurance industry. Well, Sept. 11 killed the insurance sector, but brought the advantages of self-healing, distributed applications into clear focus. (For more on self-healing, distributed systems see the O'Reilly Network's interview with IBM's Robert Morris.)

Early in 2001, I wrote an article, "Jini as an Enterprise Solution," that outlined some of Pronoic's experiences in developing a simple XML application server that used Apache Tomcat, Apache Cocoon, a JavaSpace, and consequently, Jini. We called this app server Crudlet, standing for "create, retrieve, update, delete, lifecycle, exist, and template." Crudlet grew out of our need for a simple Web interface to objects in a JavaSpace. Of course, that is not enough, and the Crudlet app server grew to include a set of agents that ran independently and used Jini's lookup and discovery protocol to discover the space and perform simple functions.

Eventually, the Crudlet work turned into something we call the Corporate Operating System (COS), which I'll describe later in this article. But first, let's look at the original Crudlet. Logically, the flow went something like this. An underwriter (in our case) would enter the details of a risk they wished to shop out. We would capture the form elements, which, using our Crudlet tag library, would be parsed by Cocoon, and values then inserted into JavaBeans. These beans would fire property change events that would be picked up by a virtual Bean box that populated a primitive JavaSpace entry or entries, which would then be written to the space. Agents would race to the new entry looking for their signature flags. Herein lies the key design flaw in Crudlet 1.0: Each primitive was coupled to expected services by a series of internal flags.

See MailEntry for an example. It has a mMailableFlag boolean that is used by the mailer service ("pat the postman") to match against. JavaSpaces only work by associative mapping; one side-effect of this is that you can't just match against an interface -- you need a publicly-accessible field to search on, since you can't instantiate an interface to make a template.

It's not such a good design pattern, since adding a new service that should act on an entry means you need to update the entry, which in most cases means flushing the system of all the old entries. During development this can happen a lot.

We were at the Jini Community Conference in Amsterdam in December 2000, arguing this very point over a few beers, when suddenly the solution became obvious. We devised a scheme whereby we use ultra-lightweight ServiceFlag entries to notify services of the entry they need to act on. This idea quickly led us to a napkin sketch design of a system where, in the case of a create (entry) being called, we'd write a flag into the space. This flag would be collected by a distribution service whose job it is to know what services need to scoop up this newly created entry. This distribution service would simply keep an internal hashtable of known ensemble configurations obtained on startup from an XML file. This lets whole ensembles run asynchronously and yet remain coordinated and very structured. Persistence is maintained via transactions and the JavaSpace acts as the focus of coordination.

So Many Issues

This leads to the second issue: the Crudlet doesn't have a well-defined set of actual interfaces. This was addressed to some extent in the last build of the Crudlet and Tennis source code before the effects of Sept. 11 finally closed down the Risk2Risk project development team. However, it became rapidly evident that we needed a concise interface to the entire ensemble.

Thirdly, the XML tag library itself required urgent review, in light of the significant update to the Cocoon XML processing system (from version 1 to version 2), and the emergence of new utilities, such as JAXB and Castor, for converting Java objects to XML and back again. By embracing the new, retrofitting the old, and following good eXtreme Programming techniques, we were bound to scrap a majority of our code in favor of a new architecture.

Lastly, although Crudlet and Tennis had been open-sourced, we as developers had signed away the rights to the use of the name and as such, in combination with the realization of the many faults in the architecture, we decided to scrap it all and start again, building on what we had learned.

A Night with Rio

At the Jini nerd-off Discovery 01, held in a small hotel somewhere in Princeton, N.J., we met Dennis Reedy and Jim Clarke from Sun's professional services team. We were impressed with Dennis' presentation on their Rio project. At Dennis' invitation, Phil Blythe from Pronoic gave a presentation about Crudlet at JavaOne 2001. They huddled for a few days in a hotel room, where Dennis filled Phil's head with radical ideas.

Rio's clever idea is that you specify the services you need in a structure called an operational string. The operational string is an object graph that contains knowledge on what it takes to provision and instantiate services. You can chain operational strings together, creating graphs of inter-related services that are then delivered through the network, offering a capability that an enterprise or organization provides. Operational Strings can be created as a bunch of nested XML documents and fed into a service called a provisioning monitor.

The provisioning monitor provides a ServiceUI -- a UI that lets you load the operational string. It's neat and impresses suits and techies alike because of the funky graphics. Rio provides a structure called a CyberNode, which is like a VM with quality-of-service tags and Jini's ability to automatically link up with other services. The provisioning monitor loads the operational string (an XML file), which tells it which JiniServiceBeans (JSBs) to instantiate and allocate to the various CyberNodes. The provisioning monitor is also charged with the responsibility of watching the JSBs it has provisioned, and reprovisioning any that fall over for whatever reason. Via ServiceUI, the provisioning monitor, and in fact, any of Rio's ensemble visualization tools, you can call up a UI for any of the services running anywhere. Because CyberNodes have associated QoS information, you can intelligently route services to machines very efficiently and get all the automagic load-balancing and failover you need. JSBs take most of the work out of writing Jini services, leaving you with only the core service code. They can be persistent because the CyberNode itself is an activatable, persistent service.

O'Reilly Emerging Technologies Conference

The 2002 O'Reilly Emerging Technologies Conference explored how P2P and Web services are coming together in a new Internet operating system.

Rio offers additional benefits to the ensemble application developer. It provides a peer-to-peer event model, watchable objects, and resource pools such as thread pools, or connection pools. It talks to JavaSpaces nicely with its own space implementation and also provides the capability for a JSB to implement support for both RMI/JRMP and RMI/IIOP objects. JSBs contain a Servant class providing support for CORBA, using the Portable Object Adaptor (POA).

(Now, since I have never gone down the CORBA route, when I read that Rio's JSBs, and hence COS' JBSs, contain such a Servant, I figured I should find out what one is. It turns out that a Servant is a bit like a stupid Jini proxy. Calls to the CORBA objects are routed to the Servant. Anyway, for those connecting to legacy systems, Rio does it. I am so glad I have been living in a Jini world for the last two years. I skipped Cobol at university too.)

Rio also provides a tunnelling service, called Lincoln, that enables dynamic discovery of Jini Lookup Services across networks that are out of multicast range, or do not forward multicast packets.

Announcement and Request packets are tunnelled to either a remote peer that re-mulitcasts the forwarded packet using the original mulitcast group, or to a remote subnet whose router supports directed broadcasts. They've also extended Ant and the Buildtool projects to address some of the most common issues with the development of services surrounding the assembly of JAR files, which include service implementation, download, and user interface funcationality.

As if that weren't enough, Rio provides Dynamic Web Application Archive (WAR) Support. This means you can attach a WAR as an attribute, describing entry points corresponding to JSP responder types (HTML, XML, WML). This capability includes a controller servlet adhering to the MVC (Model-View-Controller) pattern, focusing on the provisioning of the WAR to the Web container, and providing the navigation to direct requests to the appropriate JSP. With this capability, JSBs can deliver Web capabilities on demand.

Rio provides the following tools:

You can download and read up on Rio at jini.org (free registration required). The project is very active and gathering quite a bit of excitement.


Enter COS

Rio's underlying technology provided many of the building blocks we needed to realize our design goals, which by now had been formalized as follows:

That is, enable applications that keep running with no loss or corruption of data, even despite the physical loss of one or even several of the actual computers running the application; in other words, applications that are Sept. 11-proof.

We began development of a Rio-based version of the system we had sketched out on a napkin in Amsterdam many months ago. We dubbed this system the Corporate Operating System (COS). COS builds on Rio to provide an environment where multiple ensemble applications can coexist using shared resources and operate securely in a distributed computing environment. A COS application is a named collection of clients and shared data, together with a an ensemble of services, that can be submitted to a resource provision grid for execution.

Rio goes a long way towards eliminating any single points of failure from the system, but our reliance on a JavaSpace just introduced such a problem again. Various strategies were discussed at Discovery 01, and the model we chose reflects those discussions. We built a simple, space federator JSB called Kirk whose job it is to keep two spaces in sync.

Like any good application, COS applications can be started, stopped, and removed from the environment, upgraded, and redeployed without interfering with other running applications.

This is essential for production systems and perfect for concurrency in development environments where many developers wish to run overlapping versions of the same code.

COS applications provide 100% continuity of service based on the contiguity of services running in the ensemble. A contiguous component can start and stop, and resume operation from exactly where it left off. This type of service creates a situation where each and any service can be taken off-line, upgraded, and restarted with nothing more than a performance degradation to active applications.


COS applications are specified as using an XML file that holds a named ServiceEnsemble and a Rio operational string.

An example service ensemble is presented here.

Listing 1. COS service ensemble.

<ServiceEnsemble name="ArchiveEnsemble">
  <ServiceMode id="KarenArchiveCreate"

  <ServiceMode id="KirkReplicateCreate"

This describes two service modes, KarenArchiveCreate and KirkReplicateCreate. This, in effect, tells the distribution service that for this named application, whenever an entry is created, issue a service flag for Karen (an Archivist service that talks to a JDBC database) to read the entry from the space and run the entry's archive() method. Also, issue a flag that instructs Kirk (a space federator that keeps two spaces in sync) to replicate that object in Kirk's replication space. Both services only involve a read of the entry from the space, and so can run asynchronously.

COS also allows you to specify dependencies between service runs. This is useful when one service depends on the result of another; in this case, the service distribution model is extended to message-dependent services in the ensemble on the completion of all its dependencies.

COS provides a clean interface, CosConnectable, that enforces the methods used by third-party code (i.e., the Crudlet taglib) to access any named service ensemble. It maps closely to the original Crudlet specification itself, with create, retrieve, update, delete, lifecycle, and exists methods. There was no need for a T (template), however; instead, we have added a notifyEvent() method that allows client apps to listen for remote events from the space.

The facilities for adding and removing lifecycle events, which are new to the COS system, were an idea that never really worked in the original Crudlet code.

Lifecycle events provide persistent, guaranteed service ensemble events that are not executed immediately on delegation to the COS. They are instead submitted along with a trigger condition object, which determines when and how often the ensemble event should be executed. These events are placed into the COS as service ensemble events when a matching trigger event arrives in the space.

Triggers, created by factory methods, can be of two varieties:

COS provides Ben, a reference implementation of a service that spits time triggers into the space. These can be consumed, and time-triggerable events can be deployed, much in the manner of a Unix cron system. Indeed, cron is a key component in a good operating system.

The state of the lifecycle service is stored in the JavaSpace, allowing multiple competing instances of the lifecycle service in scalable designs.

There is a full discussion of the COS event model on the Pronoic Web site.

Crudlet's role in all of this is simply as an XML transformation mechanism, to provide some patterns used to provide Web interfaces, and perhaps to transform incoming data. It is really just a taglib that acts as a client of the COS, in this context. Matching JDP taglibs are trivial to develop.

The COS provides reference implementations of Jini Service Beans to handle JDBC archiving (Karen), scheduling (Robin), watchmaking (Ben), distributing (Max), space federation (Kirk), cleaning/garbage-collection (Otto), and http-posting (Roger). It also provides a suite of tests and utilities.

It provides a ClientAppInterface that enforces connect(), disconnect(), and isConnected() methods, and an AbstractClientApp that provides default implementations of those methods. As such, connecting to the COS is trivially easy for a client.

A simple CosClientApp would connect like this:

public class SimpleClient extents AbstractClientApp {

	static public void main(String[] args) 
        System.setSecurityManager(new java.rmi.RMISecurityManager());

		ClientApp app = new ClientApp();

		try {
		} catch (IOException ie) {
			Messenger.fatal("IO exception while parsing arguments.);

		try {
		} catch (GeneralCosException e) {
			String msg = "A general cos exception was thrown"
				+" during initialisation."
				+" Message = \""

Internal to the app/JSP/XSP page you can use app.getCosConnector() to return a CosConnectable that you use to interact with the COS.

The methods are:

I will go into the use of these in the next article, where I discuss the building, testing, deploying, and running of a simple COS application that provides peer-to-peer task allocation and tracking.

Dave Sag is a skilled Object Modeler and Software Designer, using Java as well as other programming environments.

Return to ONJava.com.

Copyright © 2009 O'Reilly Media, Inc.