ONJava.com -- The Independent Source for Enterprise Java
oreilly.comSafari Books Online.Conferences.

advertisement

AddThis Social Bookmark Button

Java RMI: Serialization
Pages: 1, 2, 3, 4, 5, 6

Making DocumentDescription Serializable

To make this more concrete, we now turn to the DocumentDescriptionclass from the RMI version of our printer server, which we implemented in Chapter 4. The code for the first nonserializable version of DocumentDescriptionwas the following:



public class DocumentDescription implements PrinterConstants {
private InputStream _actualDocument;
private int _length;
private int _documentType;
private boolean _printTwoSided;
private int _printQuality;
 
public DocumentDescription(InputStream actualDocument) throws IOException {
this(actualDocument, DEFAULT_DOCUMENT_TYPE, DEFAULT_PRINT_TWO_SIDED,
DEFAULT_PRINT_QUALITY);
}
 
public DocumentDescription(InputStream actualDocument, int documentType, boolean
printTwoSided, int printQuality)
throws IOException {
_documentType = documentType;
_printTwoSided = printTwoSided;
_printQuality = printQuality;
BufferedInputStream buffer = new BufferedInputStream(actualDocument);
DataInputStream dataInputStream = new DataInputStream(buffer);
ByteArrayOutputStream temporaryBuffer = new ByteArrayOutputStream( );
_length = copy(dataInputStream, new DataOutputStream(temporaryBuffer));
_actualDocument = new DataInputStream(new
ByteArrayInputStream(temporaryBuffer.toByteArray( )));
}
 
public int getDocumentType( ) {
return _documentType;
}
 
public boolean isPrintTwoSided( ) {
return _printTwoSided;
}
 
public int getPrintQuality( ) {
return _printQuality;
}
 
private int copy(InputStream source, OutputStream destination) throws
IOException {
int nextByte;
int numberOfBytesCopied = 0;
while(-1!= (nextByte = source.read( ))) {
destination.write(nextByte);
numberOfBytesCopied++;
}
destination.flush( );
return numberOfBytesCopied;
}
}

We will make this into a serializable class by following the steps outlined in the previous section.

Implement the Serializable interface

This is easy. All we need to do is change the class declaration:

Just to be clear: doing things this way would be a bad idea (and this is not the way RMI passes instances over the wire).

public class DocumentDescription implements Serialiazble, PrinterConstants

Make sure that instance-level, locally defined state is serialized properly

We have five fields to take care of:

private InputStream _actualDocument;
private int _length;
private int _documentType;
private boolean _printTwoSided;
private int _printQuality;

Of these, four are primitive types that serialization can handle without any problem. However, _actualDocumentis a problem. InputStreamis not a serializable class. And the contents of _actualDocumentare very important; _actualDocumentcontains the document we want to print. There is no point in serializing an instance of DocumentDescriptionunless we somehow serialize _actualDocumentas well.

If we have fields that serialization cannot handle, and they must be serialized, then our only option is to implement readObject( )and writeObject( ). For Document- Description, we declare _actualDocumentto be transient and then implement readObject( )and writeObject( )as follows:

private transient InputStream _actualDocument;
 
private void writeObject(java.io.ObjectOutputStream out) throws IOException {
out.defaultWriteObject( );
copy(_actualDocument, out);
}
 
private void readObject(java.io.ObjectInputStream in) throws IOException,
ClassNotFoundException {
in.defaultReadObject( );
ByteArrayOutputStream temporaryBuffer = new ByteArrayOutputStream( );
copy(in, temporaryBuffer, _length);
_actualDocument = new DataInputStream(new
ByteArrayInputStream(temporaryBuffer.toByteArray( )));
}
private void copy(InputStream source, OutputStream destination, int length)
throws IOException {
int counter;
int nextByte;
for (counter = 0; counter <length; counter++) {
nextByte = source.read( );
destination.write(nextByte);
}
destination.flush( );
}

Note that we declare _actualDocumentto be transient and call defaultWriteObject( )in the first line of our writeObject( )method. Doing these two things allows the standard serialization mechanism to serialize the other four instance variables without any extra effort on our part. We then simply copy _actualDocumentto the stream.

Our implementation of readObject( )simply calls defaultReadObject( )and then reads _actualDocumentfrom the stream. In order to read _actualDocumentfrom the stream, we used the length of the document, which had previously been written to the stream. In essence, we needed to encode some metadata into the stream, in order to correctly pull our data out of the stream.

This code is a little ugly. We're using serialization, but we're still forced to think about how to encode some of our state when we're sending it out of the stream. In fact, the code for writeObject( )and readObject( )is remarkably similar to the marshalling code we implemented directly for the socket-based version of the printer server. This is, unfortunately, often the case. Serialization's default implementation handles simple objects very well. But, every now and then, you will want to send a nonserializable object over the wire, or improve the serialization algorithm for efficiency. Doing so amounts to writing the same code you write if you implement all the socket handling yourself, as in our socket-based version of the printer server.

TIP:   There is also an order dependency here. The first value written must be the first value read. Since we start writing by calling defaultWriteObject( ), we have to start reading by calling default- ReadObject( ). On the bright side, this means we'll have an accurate value for _lengthbefore we try to read _actualDocumentfrom the stream.

Make sure that superclass state is handled correctly

This isn't a problem. The superclass, java.lang.Object, doesn't actually have any important state that we need to worry about. Since it also already has a zero-argument constructor, we don't need to do anything.

Override equals() and hashCode( ) if necessary

In our current implementation of the printer server, we don't need to do this. The server never checks for equality between instances of DocumentDescription. Nor does it store them in a container object that relies on their hashcodes.

Did We Cheat When Implementing Serializable for DocumentDescription?

It may seem like we cheated a bit in implementing DocumentDescription. Three of the five steps in making a class serializable didn't actually result in changes to the code. Indeed, the only work we really did was implementing readObject( )and writeObject( ). But it's not really cheating. Serialization is just designed to be easy to use. It has a good set of defaults, and, at least in the case of value objects intended to be passed over the wire, the default behavior is often good enough.

The Serialization Algorithm

By now, you should have a pretty good feel for how the serialization mechanism works for individual classes. The next step in explaining serialization is to discuss the actual serialization algorithm in a little more detail. This discussion won't handle all the details of serialization (Though we'll come close). Instead, the idea is to cover the algorithm and protocol, so you can understand how the various hooks for customizing serialization work and how they fit into the context of an RMI application.

The Data Format

The first step is to discuss what gets written to the stream when an instance is serialized. Be warned: it's a lot more information than you might guess from the previous discussion.

An important part of serialization involves writing out class-related metadata associated with an instance. Most instances are more than one class. For example, an instance of Stringis also an instance of Object. Any given instance, however, is an instance of only a few classes. These classes can be written as a sequence: C1, C2... CN, in which C1is a superclass of C2, C2is a superclass of C3, and so on. This is actually a linear sequence because Java is a single inheritance language for classes. We call C1the least superclass and CNthe most-derived class. See Figure 10-4.

Diagram
Figure 10-4. Inheritance diagram

After writing out the associated class information, the serialization mechanism stores out the following information for each instance:

  • A description of the most-derived class.
  • Data associated with the instance, interpreted as an instance of the least superclass.
  • Data associated with the instance, interpreted as an instance of the second least superclass.

And so on until:

  • Data associated with the instance, interpreted as an instance of the most-derived class.

So what really happens is that the type of the instance is stored out, and then all the serializable state is stored in discrete chunks that correspond to the class structure. But there's a question still remaining: what do we mean by "a description of the most-derived class?" This is either a reference to a class description that has already been recorded (e.g., an earlier location in the stream) or the following information:

  • The version ID of the class, which is an integer used to validate the .class files
  • A boolean stating whether writeObject( )/ readObject( )are implemented
  • The number of serializable fields
  • A description of each field (its name and type)
  • Extra data produced by ObjectOutputStream's annotateClass( )method
  • A description of its superclass if the superclass is serializable

This should, of course, immediately seem familiar. The class descriptions consist entirely of metadata that allows the instance to be read back in. In fact, this is one of the most beautiful aspects of serialization; the serialization mechanism automatically, at runtime, converts class objects into metadata so instances can be serialized with the least amount of programmer work.

A Simplified Version of the Serialization Algorithm

In this section, I describe a slightly simplified version of the serialization algorithm. I then proceed to a more complete description of the serialization process in the next section.

Writing

Because the class descriptions actually contain the metadata, the basic idea behind the serialization algorithm is pretty easy to describe. The only tricky part is handling circular references.

The problem is this: suppose instance Arefers to instance B. And instance Brefers back to instance A. Completely writing out Arequires you to write out B. But writing out Brequires you to write out A. Because you don't want to get into an infinite loop, or even write out an instance or a class description more than once you need to keep track of what's already been written to the stream. (Serialization is a slow process that uses the reflection API quite heavily in addition to the bandwidth)

ObjectOutputStreamdoes this by maintaining a mapping from instances and classes to handles. When writeObject( )is called with an argument that has already been written to the stream, the handle is written to the stream, and no further operations are necessary.

If, however, writeObject( )is passed an instance that has not yet been written to the stream, two things happen. First, the instance is assigned a reference handle, and the mapping from instance to reference handle is stored by ObjectOutputStream. The handle that is assigned is the next integer in a sequence.

TIP:   Remember the reset( )method on ObjectOutputStream? It clears the mapping and resets the handle counter to 0x7E0000 .RMI also automatically resets its serialization mechanism after every remote method call.

Second, the instance data is written out as per the data format described earlier. This can involve some complications if the instance has a field whose value is also a serializable instance. In this case, the serialization of the first instance is suspended, and the second instance is serialized in its place (or, if the second instance has already been serialized, the reference handle for the second instance is written out). After the second instance is fully serialized, serialization of the first instance resumes. The contents of the stream look a little bit like Figure 10-5.

Diagram.
Figure 10-5. Contents of Serialization's data stream.

Reading

From the description of writing, it's pretty easy to guess most of what happens when readObject( )is called. Unfortunately, because of versioning issues, the implementation of readObject( )is actually a little bit more complex than you might guess.

When it reads in an instance description, ObjectInputStreamgets the following information:

  • Descriptions of all the classes involved
  • The serialization data from the instance

The problem is that the class descriptions that the instance of ObjectInputStreamreads from the stream may not be equivalent to the class descriptions of the same classes in the local JVM. For example, if an instance is serialized to a file and then read back in three years later, there's a pretty good chance that the class definitions used to serialize the instance have changed.

This means that ObjectInputStreamuses the class descriptions in two ways:

  • It uses them to actually pull data from the stream, since the class descriptions completely describe the contents of the stream.
  • It compares the class descriptions to the classes it has locally and tries to determine if the classes have changed, in which case it throws an exception. If the class descriptions match the local classes, it creates the instance and sets the instance's state appropriately.

RMI Customizes the Serialization Algorithm

RMI doesn't actually use ObjectOutputStreamand ObjectInputStream. Instead, it uses custom subclasses so it can modify the serialization process by overriding some protected methods. In this section, we'll discuss the most important modifications that RMI makes when serializing instances. RMI makes similar changes when deserializing instances, but they follow from, and can easily be deduced from, the description of the serialization changes.

Recall that ObjectOutputStreamcontained the following protected methods:

protected void annotateClass(Class cl)
protected void annotateProxyClass(Class cl)
protected boolean enableReplaceObject(boolean enable)
protected Object replaceObject(Object obj)
protected void drain(  )
protected void writeObjectOverride(Object obj)
protected void writeClassDescriptor(ObjectStreamClass classdesc)
protected void writeStreamHeader(  )

These all have default implementations in ObjectOutputStream. That is, annotateClass( )and annotateProxyClass( )do nothing. enableReplaceObject( )returns false, and so on. However, these methods are still called during serialization. And RMI, by overriding these methods, customizes the serialization process.

The three most important methods from the point of view of RMI are:

protected void annotateClass(Class cl)
protected boolean enableReplaceObject(boolean enable)
protected Object replaceObject(Object obj)

Let's describe how RMI overrides each of these.

annotateClass( )

ObjectOutputStreamcalls annotateClass( )when it writes out class descriptions. Annotations are used to provide extra information about a class that comes from the serialization mechanism and not from the class itself. The basic serialization mechanism has no real need for annotations; most of the information about a given class is already stored in the stream.

TIP:   RMI's dynamic classloading system uses annotateClass( )to record where .class files are stored. We'll discuss this more in Chapter 19.

RMI, on the other hand, uses annotations to record codebase information. That is, RMI, in addition to recording the class descriptions, also records information about the location from which it loaded the class's bytecode. Codebases are often simply locations in a filesystem. Incidentally, locations in a filesystem are often useless information, since the JVM that deserializes the instances may have a very different filesystem than the one from where the instances were serialized. However, a codebase isn't restricted to being a location in a filesystem. The only restriction on codebases is that they have to be valid URLs. That is, a codebase is a URL that specifies a location on the network from which the bytecode for a class can be obtained. This enables RMI to dynamically load new classes based on the serialized information in the stream. We'll return to this in Chapter 19.

replaceObject( )

The idea of replacement is simple; sometimes the instance that is passed to the serialization mechanism isn't the instance that ought to be written out to the data stream. To make this more concrete, recall what happened when we called rebind( )to register a server with the RMI registry. The following code was used in the bank example:

Account_Impl newAccount = new Account_Impl(serverDescription.balance);
Naming.rebind(serverDescription.name, newAccount);
System.out.println("Account " + serverDescription.name + " successfully launched.");
Account_Impl newAccount = new Account_Impl(serverDescription.balance);
Naming.rebind(serverDescription.name, newAccount);
System.out.println("Account " + serverDescription.name + " successfully launched.");

This creates an instance of Account_Impland then calls rebind( )with that instance. Account_Implis a server that implements the Remoteinterface, but not the Serializableinterface. And yet, somehow, the registry, which is running in a different JVM, is sent something.

What the registry actually gets is a stub. The stub for Account_Impl, which was automatically generated by rmic, begins with:

public final class Account_Impl_Stub extends java.rmi.server.RemoteStub

java.rmi.server.RemoteStubis a class that implements the Serializableinterface. The RMI serialization mechanism knows that whenever a remote server is "sent" over the wire, the server object should be replaced by a stub that knows how to communicate with the server (e.g., a stub that knows on which machine and port the server is listening).

Calling Naming.rebind( )actually winds up passing a stub to the RMI registry. When clients make calls to Naming.lookup( ), as in the following code snippet, they also receive copies of the stub. Since the stub is serializable, there's no problem in making a copy of it:

_account = (Account)Naming.lookup(_accountNameField.getText( ));

In order to enable this behavior, ObjectOutputStreamcalls enableReplaceObject( )and replaceObject( )during the serialization process. In other words, when an instance is about to be serialized, ObjectOutputStreamdoes the following:

  1. It calls enableReplaceObject( )to see whether instance replacement is enabled.
  2. If instance replacement is enabled, it calls replaceObject( ), passing in the instance it was about to serialize, to find out which instance it should really write to the stream.
  3. It then writes the appropriate instance to the stream.

Maintaining Direct Connections

A question that frequently arises as distributed applications get more complicated involves message forwarding. For example, suppose that we have three communicating programs: A, B, and C. At the start, Ahas a stub for B, Bhas a stub for C, and Chas a stub for A. See Figure 10-6.

Diagram.
Figure 10-6. Communication between three applications.

Now, what happens if Acalls a method, for example, getOtherServer( ), on Bthat "returns" C? The answer is that Agets a deep copy of the stub Buses to communicate with C. That is, Anow has a direct connection to C; whenever Atries to send a message to C, Bis not involved at all. This is illustrated in Figure 10-7.

Diagram.
Figure 10-7. Improved communication between three applications.

This is very good from a bandwidth and network latency point of view. But it can also be somewhat problematic. Suppose, for example, Bimplements load balancing. Since Bisn't involved in the Ato Ccommunication, it has no direct way of knowing whether Ais still using C, or how heavily. We'll revisit this in Chapters and , when we discuss the distributed garbage collector and the Unreferencedinterface.

Pages: 1, 2, 3, 4, 5, 6

Next Pagearrow