XSLT Processing with Java
Pages: 1, 2, 3, 4, 5, 6, 7, 8, 9
Feeding JDOM Output into JAXP
The DOM API is tedious to use, so many Java programmers opt for
JDOM instead. The typical usage pattern is to generate XML dynamically using
JDOM and then somehow transform that into a web page using XSLT. This presents
a problem because JAXP does not provide any direct implementation of the javax.xml.Source interface that integrates with JDOM.
|
As this is being written, members of the JDOM community are writing a JDOM implementation of |
- Use
org.jdom.output.SAXOutputterto pipe SAX 2 events from JDOM to JAXP. - Use
org.jdom.output.DOMOutputterto convert the JDOM tree to a DOM tree, and then usejavax.xml.transform.dom.DOMSourceto read the data into JAXP. - Use
org.jdom.output.XMLOutputterto serialize the JDOM tree to XML text, and then usejava.xml.transform.stream.StreamSourceto parse the XML back into JAXP.
JDOM to SAX approach
The SAX approach is generally preferable to other approaches. Its primary advantage is that it does not require an intermediate transformation to convert the JDOM tree into a DOM tree or text. This offers the lowest memory utilization and potentially the fastest performance.
In support of SAX, JDOM offers the org.jdom.output.SAXOutputter class. The following code fragment demonstrates its usage:
TransformerFactory transFact = TransformerFactory.newInstance( );
if (transFact.getFeature(SAXTransformerFactory.FEATURE)) {
SAXTransformerFactory stf = (SAXTransformerFactory) transFact;
// the 'stylesheet' parameter is
// an instance of JAXP's
// javax.xml.transform.Templates interface
TransformerHandler transHand = stf.newTransformerHandler(stylesheet);
// result is a Result instance
transHand.setResult(result);
SAXOutputter saxOut = new SAXOutputter(transHand);
// the 'jdomDoc' parameter is an instance
// of JDOM's org.jdom.Document class. In contains
// the XML data
saxOut.output(jdomDoc);
} else {
System.err.println("SAXTransformerFactory is not supported");
}
JDOM to DOM approach
The DOM approach is generally a little slower and will not work
if JDOM uses a different DOM implementation than JAXP. JDOM, like JAXP, can
utilize different DOM implementations behind the scenes. If JDOM refers to a
different version of DOM than JAXP, you will encounter exceptions when you try
to perform the transformation. Since JAXP uses Apache's Crimson parser by
default, you can configure JDOM to use Crimson with the org.jdom.adapters.CrimsonDOMAdapter class. The following
code shows how to convert a JDOM Document into a DOM Document:
org.jdom.Document jdomDoc = createJDOMDocument( );
// add data to the JDOM Document
...
// convert the JDOM Document into a DOM Document
org.jdom.output.DOMOutputter domOut = new org.jdom.output.DOMOutputter(
"org.jdom.adapters.CrimsonDOMAdapter");
org.w3c.dom.Document domDoc = domOut.output(jdomDoc);
The second line is highlighted because it is likely to give you
the most problems. When JDOM converts its internal object tree into a DOM
object tree, it must use some underlying DOM implementation. In many respects,
JDOM is similar to JAXP because it delegates many tasks to underlying
implementation classes. The DOMOutputter
constructors are overloaded as follows:
// use the default adapter class
public DOMOutputter( )
// use the specified adapter class
public DOMOutputter(String adapterClass)
The first constructor shown here will use JDOM's default DOM
parser, which is not necessarily the same DOM parser that JAXP uses. The
second method allows you to specify the name of an adapter class, which must
implement the org.jdom.adapters.DOMAdapter
interface. JDOM includes standard adapters for all of the widely used DOM
implementations, or you could write your own adapter class.
JDOM to text approach
In the final approach listed earlier, you can utilize java.io.StringWriter and java.io.StringReader. First create the JDOM data as
usual, then use org.jdom.output.XMLOutputter to
convert the data into a String of XML:
StringWriter sw = new StringWriter( );
org.jdom.output.XMLOutputter xmlOut
= new org.jdom.output.XMLOutputter("", false);
xmlOut.output(jdomDoc, sw);
The parameters for XMLOutputter allow you to specify the amount of indentation for the output along with a boolean flag indicating whether or not linefeeds should
be included in the output. In the code example, no spaces or linefeeds are
specified in order to minimize the size of the XML that is produced. Now that
the StringWriter contains your XML, you can use a
StringReader along with javax.xml.transform.stream.StreamSource to read the data
into JAXP:
StringReader sr = new StringReader(sw.toString( ));
Source xmlSource = new javax.xml.transform.stream.StreamSource(sr);
The transformation can then proceed just as it did in Example 5-4. The main drawback to this approach is that the XML, once converted to text form, must then be parsed back in by JAXP before the transformation can be applied.
Stylesheet Compilation
XSLT is a computer-programming language, expressed using XML syntax. This is not for the benefit of the computer, but rather for human interpretation. Before the stylesheet can be processed, it must be converted into some internal machine-readable format. This process should sound familiar, because it is the same process used for every high-level programming language. You, the programmer, work in terms of the high-level language, and an interpreter or compiler converts this language into some machine format that can be executed by the computer.
Interpreters analyze source code and translate it into machine code with each execution. In this case of XSLT, this requires that the stylesheet be read into memory using an XML parser, translated into machine format, and then applied to your XML data. Performance is the obvious problem, particularly when you consider that stylesheets rarely change. Typically, the stylesheets are defined early on in the development process and remain static, while XML data is generated dynamically with each client request.
A better approach is to parse the XSLT stylesheet into memory once, compile it to machine-format, and then preserve that machine representation in memory for repeated use. This is called stylesheet compilation and is no different in concept than the compilation of any programming language.
Templates API
Different XSLT processors implement stylesheet compilation differently, so JAXP includes the javax.xml.transform.Templates interface to provide
consistency. This is a relatively simple interface with the following API:
public interface Templates {
java.util.Properties getOutputProperties( );
javax.xml.transform.Transformer newTransformer( )
throws TransformerConfigurationException;
}
The getOutputProperties( ) method
returns a clone of the properties associated with the <xsl:output> element, such as method="xml", indent="yes",
and encoding="UTF-8". You might recall that java.util.Properties (a subclass of java.util.Hashtable) provides key/value mappings from
property names to property values. Since a clone, or deep copy, is returned,
you can safely modify the Properties instance and
apply it to a future transformation without affecting the compiled stylesheet
that the instance of Templates represents.
The newTransformer( ) method is more
commonly used and allows you to obtain a new instance of a class that
implements the Transformer interface. It is this
Transformer object that actually allows you to
perform XSLT transformations.
Since the implementation of the Templates interface is hidden by JAXP, it must be created
by the following method on javax.xml.transform.TransformerFactory:
public Templates newTemplates(Source source)
throws TransformerConfigurationException
As in earlier examples, the Source
may obtain the XSLT stylesheet from one of many locations, including a
filename, a system identifier, or even a DOM tree. Regardless of the original
location, the XSLT processor is supposed to compile the stylesheet into an
optimized internal representation.
Whether the stylesheet is actually compiled is up to the implementation, but a safe bet is that performance will continually improve over the next several years as these tools stabilize and vendors have time to apply optimizations.
Figure 5-6 illustrates the relationship between Templates and Transformer instances.
|
Thread safety is an important issue in any Java application,
particularly in a web context where many users share the same stylesheet. As
Figure 5-6 illustrates, an instance of Templates is
thread-safe and represents a single stylesheet. During the transformation
process, however, the XSLT processor must maintain state information and
output properties specific to the current client. For this reason, a separate
Transformer instance must be used for each
concurrent transformation.
Transformer is an abstract class in
JAXP, and implementations should be lightweight. This is an important goal
because you will typically create many copies of Transformer, while the number of Templates is relatively small. Transformer instances are not thread-safe, primarily
because they hold state information about the current transformation. Once the
transformation is complete, however, these objects can be reused.
A Stylesheet Cache
XSLT transformations commonly occur on a shared web server with
a large number of concurrent users, so it makes sense to use Templates whenever possible to optimize performance.
Since each instance of Templates is thread-safe, it
is desirable to maintain a single copy shared by many clients. This reduces
the number of times your stylesheets have to be parsed into memory and
compiled, as well as the overall memory footprint of your application.
The code shown in Example 5-10 illustrates a custom XSLT stylesheet cache that automates the mundane
tasks associated with creating Templates instances
and storing them in memory. This cache has the added benefit of checking the
lastModified flag on the underlying file, so it
will reload itself whenever the XSLT stylesheet is modified. This is highly
useful in a web-application development environment because you can make
changes to the stylesheet and simply click on Reload on your web browser to
see the results of the latest edits.
Example 5-10: StylesheetCache.java
package com.oreilly.javaxslt.util;
import java.io.*;
import java.util.*;
import javax.xml.transform.*;
import javax.xml.transform.stream.*;
/**
* A utility class that caches XSLT
* stylesheets in memory.
*
*/
public class StylesheetCache {
// map xslt file names to MapEntry instances
// (MapEntry is defined below)
private static Map cache = new HashMap( );
/**
* Flush all cached stylesheets from
* memory, emptying the cache.
*/
public static synchronized void flushAll( ) {
cache.clear( );
}
/**
* Flush a specific cached stylesheet from memory.
*
* @param xsltFileName the file name of
* the stylesheet to remove.
*/
public static synchronized void flush(String xsltFileName) {
cache.remove(xsltFileName);
}
/**
* Obtain a new Transformer instance for the
* specified XSLT file name.
* A new entry will be added to the
* cache if this is the first request
* for the specified file name.
*
* @param xsltFileName the file name
* of an XSLT stylesheet.
* @return a transformation context
* for the given stylesheet.
*/
public static synchronized Transformer newTransformer(String xsltFileName)
throws TransformerConfigurationException {
File xsltFile = new File(xsltFileName);
// determine when the file was last modified on disk
long xslLastModified = xsltFile.lastModified( );
MapEntry entry = (MapEntry) cache.get(xsltFileName);
if (entry != null) {
// if the file has been modified more recently than the
// cached stylesheet, remove the entry reference
if (xslLastModified > entry.lastModified) {
entry = null;
}
}
// create a new entry in the cache if necessary
if (entry == null) {
Source xslSource = new StreamSource(xsltFile);
TransformerFactory transFact = TransformerFactory.newInstance( );
Templates templates = transFact.newTemplates(xslSource);
entry = new MapEntry(xslLastModified, templates);
cache.put(xsltFileName, entry);
}
return entry.templates.newTransformer( );
}
// prevent instantiation of this class
private StylesheetCache( ) {
}
/**
* This class represents a value in the cache Map.
*/
static class MapEntry {
long lastModified; // when the file was modified
Templates templates;
MapEntry(long lastModified, Templates templates) {
this.lastModified = lastModified;
this.templates = templates;
}
}
}
Because this class is a singleton, it has a private constructor
and uses only static methods. Furthermore, each method is declared as synchronized in an effort to avoid potential threading
problems.
The heart of this class is the cache itself, which is
implemented using java.util.Map:
private static Map cache = new HashMap( );
Although HashMap is not thread-safe,
the fact that all of our methods are synchronized
basically eliminates any concurrency issues. Each entry in the map contains a
key/value pair, mapping from an XSLT stylesheet filename to an instance of the
MapEntry class. MapEntry
is a nested class that keeps track of the compiled stylesheet along with when
its file was last modified:
static class MapEntry {
long lastModified; // when the file was modified
Templates templates;
MapEntry(long lastModified, Templates templates) {
this.lastModified = lastModified;
this.templates = templates;
}
}
Removing entries from the cache is accomplished by one of two methods:
public static synchronized void flushAll( ) {
cache.clear( );
}
public static synchronized void flush(String xsltFileName) {
cache.remove(xsltFileName);
}
The first method merely removes everything from the Map, while the second removes a single stylesheet.
Whether you use these methods is up to you. The flushAll method, for instance, should probably be called
from a servlet's destroy( ) method to ensure proper
cleanup. If you have many servlets in a web application, each servlet may wish
to flush specific stylesheets it uses via the flush(...) method. If the xsltFileName parameter is not found, the Map implementation silently ignores this request.
The majority of interaction with this class occurs via the newTransformer method, which has the following signature:
public static synchronized Transformer newTransformer(String xsltFileName) throws TransformerConfigurationException {
The parameter, an XSLT stylesheet filename, was chosen to
facilitate the "last accessed" feature. We use the java.io.File class to determine when the file was last
modified, which allows the cache to automatically reload itself as edits are
made to the stylesheets. Had we used a system identifier or InputStream instead of a filename, the auto-reload
feature could not have been implemented. Next, the File object is created and its lastModified flag is checked:
File xsltFile = new File(xsltFileName);
// determine when the file was last modified on disk
long xslLastModified = xsltFile.lastModified( );
The compiled stylesheet, represented by an instance of MapEntry, is then retrieved from the Map. If the entry is found, its timestamp is compared
against the current file's timestamp, thus allowing auto-reload:
MapEntry entry = (MapEntry) cache.get(xsltFileName);
if (entry != null) {
// if the file has been modified more
// recently than the cached stylesheet,
// remove the entry reference
if (xslLastModified > entry.lastModified) {
entry = null;
}
}
Next, we create a new entry in the cache if the entry object
reference is still null. This is accomplished by
wrapping a StreamSource around the File object, instantiating a TransformerFactory instance, and using that factory to
create our Templates object. The Templates is then stored in the cache so it can be reused
by the next client of the cache:
// create a new entry in the cache if necessary
if (entry == null) {
Source xslSource = new StreamSource(xsltFile);
TransformerFactory transFact = TransformerFactory.newInstance( );
Templates templates = transFact.newTemplates(xslSource);
entry = new MapEntry(xslLastModified, templates);
cache.put(xsltFileName, entry);
}
Finally, a brand new Transformer is
created and returned to the caller:
return entry.templates.newTransformer( );
|
Related Reading
|
Returning a new Transformer is
critical because, although the Templates object is
thread-safe, the Transformer implementation is not.
Each caller gets its own copy of Transformer so
multiple clients do not collide with one another.
One potential improvement on this design could be to add a lastAccessed timestamp to each MapEntry object. Another thread could then execute every
couple of hours to flush map entries from memory if they have not been
accessed for a period of time. In most web applications, this will not be an
issue, but if you have a large number of pages and some are seldom accessed,
this could be a way to reduce the memory usage of the cache.
Another potential modification is to allow javax.xml.transform.Source objects to be passed as a
parameter to the newTransformer method instead of
as a filename. However, this would make the auto-reload feature impossible to
implement for all Source types.
Return to ONJava.com.

