ONJava.com    
 Published on ONJava.com (http://www.onjava.com/)
 See this if you're having trouble printing code examples


XML Processing with TRaX

by Craig Pfeifer
07/02/2001

Currently, we have standard APIs for representing an XML document as a tree of objects through the W3C's DOM specification, and as a series of events through the SAX API. JAXP 1.0 gave us a standard Java API for XML parsers, and JAXP 1.1 expands on this to include a standard API for XSLT engines. This standard API is the Transformation API for XML, or TRaX for short. I will cover TRaX basic usage and explain the top-level interfaces to show how powerful this API is. The specific TRaX implementation that I am working from is the Xalan-Java 2 XSLT processor from the Apache project.

This article assumes an awareness of the major facilities for processing and representing XML documents (DOM, SAX and XSLT), but it is not specific to these technologies.

Purpose of TRaX

The TRaX API extends the original JAXP mission to include XML transformations: provide a vendor and implementation agnostic standard Java API for specifying and executing XML transformations. This is important to note, because TRaX is more than just a standard interface for XSLT engines -- it is designed to be used as a general-purpose transformation interface for XML documents. The TRaX specification is a product of the JAXP 1.1 API, Java Specification Request #63.

TRaX isn't a competitor to the existing DOM, JDOM and SAX APIs used to represent and process XML, but a common Java API to bridge the various XML transformation methods (a la JDBC, JNDI, etc.) including SAX Events and XSLT Templates. In fact, TRaX relies upon a SAX2- and DOM-level-2-compliant XML parser/XSLT engine. JAXP 1.0 allows the developer to change XML parsers by setting a property, and TRaX provides the same functionality for XSLT engines.

Code Example

Here is a sample of how to apply an XSLT stylesheet to an XML document and write the results out to a file. In this example, both the stylesheet and the XML document exist as files, but they could just as easily have come from any Java InputStream or Reader class. The same follows for the results of the transformation; I could've just as easily written the results out to any Java OutputStream or Writer class.

// create the XML content input source:
// can be a DOM node, SAX stream, or any
// Java input stream/reader
String xmlInputFile = "myXMLinput.xml";
Source xmlSource = new StreamSource(new FileInputStream(xmlInputFile));

// create the XSLT Stylesheet input source
// can be a DOM node, SAX stream, or a
// java input stream/reader
String xsltInputFile = "myXsltStylesheet.xsl";
Source xsltSource = new StreamSource(new
FileInputStream(xsltInputFile));

// create the result target of the transformation
// can be a DOM node, SAX stream, or a java out
// stream/reader
String xmlOutputFile = "result.html";
Result transResult = new StreamResult(new
FileOutputStream(xmlOutputFile));

// create the transformerfactory & transformer instance
TransformerFactory tf = TransformerFactory.newInstance();
Transformer t = tf.newTransformer(xsltSource);

// execute transformation & fill result target object
t.transform(xmlSource, transResult);

Related article:

JDOM and TRaX -- Two innovative technologies have recently been announced to the XML developer community: JDOM, a Java-specific DOM; and TRaX, an API for XML transformations. [from XML.com]

Comment on this articleWhat are your thoughts on TRaX?
Post your comments

The first three stanzas simply establish our inputs and result targets, and aren't that interesting, with one exception. Notice that the XSLT stylesheet isn't handled via a different class in TRaX. It's treated just like any other XML source document, because that's exactly what it is. We use the stream implementations of the Source and Result interfaces from the javax.xml.transform.stream package to handle reading the data from our file streams.

In the fourth stanza, we use the TransformerFactory to get an instance of a Transformer, and then use the Source instance for the XSLT stylesheet we created in the second stanza to define the transformation that this transformer will perform. A Transformer actually executes the transformation and assembles the result. A single Transformer instance can be reused, but it is not thread-safe.

In this example, the XSLT stylesheet is reprocessed for each successive transformation. A very common case is that the same transformation is applied multiple times to different Sources, perhaps in different threads. A more efficient approach in this case is to process the transformation stylesheet once, and save this object for successive transformations. This is achieved through the use of the TraX Templates interface.

Templates Code Example

// we've already set up our content Source instance, 
// XSLT Source instance, TransformerFactory, and 
// Result target from the previous example

// process the XSLT stylesheet into a Templates instance
// with our TransformerFactory instance
Templates t = tf.newTemplates(xsltSource);

// whenever you need to execute this transformation, create 
//  a new Transformer instance from the Templates instace
Transformer trans = t.newTransformer();

// execute transformation & fill result target object
trans.transform(xmlSource, transResult);

While the Transformer performs the transformation, a Templates instance is the actual run-time representation of the processed transformation instructions. Templates instances may be reused to increase performance, and they are thread-safe. It might seem odd that an interface has a plural name, but it stems from the fact that an XSLT stylesheet consists of a collection of one or more xsl:template elements. Each template element defines a transformation in that stylesheet, so it follows that the simplest name for a representation of a collection of template elements is Templates.


Java and XML

Java and XML
By Brett McLaughlin
1st Edition June 2000
0-596-00016-2, Order Number: 0162
498 pages, $39.95

Basic TRaX Pieces The Interfaces

One of the main reasons the TRaX API is so clean and simple is the Interface-driven approach to design. The highest-level interfaces define the essential entities that are being modeled, and the interactions are left to the implementations. The interfaces themselves aren't very interesting. They are essentially marker interfaces.

public interface Source {

    public void setSystemId(String systemId);
    public String getSystemId();
}

Source

Implementations of the Source interface provide access to the XML document to be processed. TRaX defines Source implementations for DOM trees (DOMSource); SAX 2.0 InputSources (SAXSource); and Java InputStreams, Readers and any of their derived classes (StreamSource).

public interface Result {

    public void setSystemId(String systemId);
    public String getSystemId();
}

Result

Implementations of the Result interface provide access to the transformed XML document. TRaX defines Result implementations for DOM trees (DOMResult); SAX 2.0 ContentHandlers (SAXResult); and Java OutputStreams, Writers and any of their derived classes (StreamResult).

public interface Templates {
    Transformer newTransformer() throws 
TransformerConfigurationException;
    Properties getOutputProperties();
}

Templates

A template implementation is the optimized, in-memory representation of an XML transformation that is processed and ready to be executed. Templates objects are safe to use in concurrent threads. To reuse a single Template instance in multiple concurrent threads, multiple Transformer instances would have to be created via the Templates.newTransformer() factory method. Each Transformer instance may be used completely independently in concurrent threads, and both the Templates and the Transformer instances can be reused for subsequent transformations.

Basic TRaX Pieces - The Abstract Classes

Transformer

Resources

Xalan-J 2 Design Document, Scott Boag, included in Xalan2 distribution

Xalan-j TRaX Fundamentals, Apache Software Foundation

JSR 63, Java Specifcication Request JavaTM API for XML Processing 1.1

DOM level 2 Core Specification, W3C

SAX 2.0: The Simple API for XML, David Megginson

A Transformer is the object that actually applies the transformation to the source document and creates the result document. However, it is not responsible for outputting, or serializing, the result of the transformation. This is the responsibility of the transformation engine's serializer and this behavior can be modified via the setOutputProperty(java.lang.String name, java.lang.String value) method. The configurable OutputProperties are defined in the OutputKeys class, and are described in the XSLT 1.0 Specification. Transformers are immutable, they cannot change which Templates instance gets applied to the Source.

TransformerFactory

The TransformerFactory is primarily responsible for creating new Transformers and Templates objects. New instances of Transformer are created via the static newTransformer() method. Processing Source instances into Templates objects is handled by the newTemplates(Source source) method.

Xalan History

Xalan 1 started off as the LotusXSL project at Lotus Corporation. Lotus contributed the code to the Apache Jakarta Project to create and maintain Xalan, Apache's XSLT engine. Lotus employees are still heavily involved in the Xalan project and currently, they are the primary developers. This is consistent with their parent company's (IBM) commitment to open source software through IBM's donations of their XML4J XML Parser (better known as Xerces Java) and their LOG4J logging package to the Apache Project.

Xalan 2 is complete refactoring of Xalan 1. The goals of the refactoring are to create an more easily understandable and maintainable code base through a more modular design approach. Consequently, the API for executing transformations is completely different in Xalan 2 from Xalan 1. Xalan 2 does provide a compatibility package to allow existing applications to move to Xalan 2 with no code changes.

Conclusion

JAXP 1.0 unified Java-based XML development efforts by providing a single, powerful, interface to all participating vendors' XML parsers. The addition of TRaX to JAXP 1.1 is a natural and necessary addition to ensure the success of XML on the Java platform.

Copyright © 2009 O'Reilly Media, Inc.