ONJava.com -- The Independent Source for Enterprise Java
oreilly.comSafari Books Online.Conferences.

advertisement

AddThis Social Bookmark Button

XSLT Processing with Java
Pages: 1, 2, 3, 4, 5, 6, 7, 8, 9

Input and Output

XSLT processors, like other XML tools, can read their input data from many different sources. In the most basic scenario, you will load a static stylesheet and XML document using the java.io.File class. More commonly, the XSLT stylesheet will come from a file, but the XML data will be generated dynamically as the result of a database query. In this case, it does not make sense to write the database query results to an XML file and then parse it into the XSLT processor. Instead, it is desirable to pipe the XML data directly into the processor using SAX or DOM. In fact, we will even see how to read nonXML data and transform it using XSLT.

System Identifiers, Files, and URLs

The simple examples presented earlier in this chapter introduced the concept of a system identifier. As mentioned before, system identifiers are nothing more than URIs and are used frequently by XML tools. For example, javax.xml.transform.Source, one of the key interfaces in JAXP, has the following API:

public interface Source {
    String getSystemId(  );
    void setSystemId(String systemId);
}

The second method, setSystemId( ), is crucial. By providing a URI to the Source, the XSLT processor can resolve URIs encountered in XSLT stylesheets. This allows XSLT code like this to work:

<xsl:import href="commonFooter.xslt"/>

When it comes to XSLT programming, you will use methods in java.io.File and java.net.URL to convert platform-specific file names into system IDs. These can then be used as parameters to any methods that expect a system ID as a parameter. For example, you would write the following code to convert a platform-specific filename into a system ID:

public static void main(String[] args) {
    // assume that the first command-line arg 
    // contains a file name
    // - on Windows, something like 
    //   "C:\home\index.xml"
    // - on Unix, something like 
    //   "/usr/home/index.xml"
    String fileName = args[0];
    File fileObject = new File(fileName);
    URL fileURL = fileObject.toURL(  );
    String systemID = fileURL.toExternalForm(  );

This code was written on several lines for clarity; it can be consolidated as follows:

String systemID = new File(fileName).toURL().toExternalForm( );

Converting from a system identifier back to a filename or a File object can be accomplished with this code:

URL url = new URL(systemID);
String fileName = url.getFile(  );
File fileObject = new File(fileName);

And once again, this code can be condensed into a single line as follows:

File fileObject = new File((new URL(systemID)).getFile( ));

JAXP I/O Design

The Source and Result interfaces in javax.xml.transform provide the basis for all transformation input and output in JAXP 1.1. Regardless of whether a stylesheet is obtained via a URI, filename, or InputStream, its data is fed into JAXP via an implementation of the Source interface. The output is then sent to an implementation of the Result interface. The implementations provided by JAXP are shown in Figure 5-3.

Diagram.
Figure 5-3. Source and Result interfaces

As you can see, JAXP is not particular about where it gets its data or sends its results. Remember that two instances of Source are always specified: one for the XML data and another for the XSLT stylesheet.

JAXP Stream I/O

As shown in Figure 5-3, StreamSource is one of the implementations of the Source interface. In addition to the system identifiers that Source provides, StreamSource allows input to be obtained from a File, an InputStream, or a Reader. The SimpleJaxp class in Example 5-3 showed how to use StreamSource to read from a File object. There are also four constructors that allow you to construct a StreamSource from either an InputStream or Reader. The complete list of constructors is shown here:

public StreamSource(  )
public StreamSource(File f)
public StreamSource(String systemId)
public StreamSource(InputStream byteStream)
public StreamSource(InputStream byteStream, String systemId)
public StreamSource(Reader characterStream)
public StreamSource(Reader characterStream, String systemId)

For the constructors that take InputStream and Reader as arguments, the first argument provides either the XML data or the XSLT stylesheet. The second argument, if present, is used to resolve relative URI references in the document. As mentioned before, your XSLT stylesheet may include the following code:

<xsl:import href="commonFooter.xslt"/>

By providing a system identifier as a parameter to the StreamSource, you are telling the XSLT processor where to look for commonFooter.xslt. Without this parameter, you may encounter an error when the processor cannot resolve this URI. The simple fix is to call the setSystemId( ) method as follows:

// construct a Source that reads from an InputStream
Source mySrc = new StreamSource(anInputStream);
// specify a system ID (a String) so the 
// Source can resolve relative URLs
// that are encountered in XSLT stylesheets
mySrc.setSystemId(aSystemId);

The documentation for StreamSource also advises that InputStream is preferred to Reader because this allows the processor to properly handle the character encoding as specified in the XML declaration.

StreamResult is similar in functionality to StreamSource, although it is not necessary to resolve relative URIs. The available constructors are as follows:

public StreamResult(  )
public StreamResult(File f)
public StreamResult(String systemId)
public StreamResult(OutputStream byteStream)
public StreamResult(Writer characterStream)

Let's look at a simple example to see some of the other options for StreamSource and StreamResult. Example 5-4 is a modification of the SimpleJaxp program that was presented earlier. It basically downloads the XML specification from the W3C web site and stores it in a temporary file on your local disk. To download the file, construct a StreamSource with a system identifier as a parameter. The stylesheet is a simple one that merely performs an identity transformation, copying the unmodified XML data to the result tree. The result is then sent to a StreamResult using its File constructor.


Example 5-4: Streams.java

package chap5;
 
import java.io.*;
import javax.xml.transform.*;
import javax.xml.transform.stream.*;
 
/**
* A simple demo of JAXP 1.1 StreamSource and
* StreamResult. This program downloads the
* XML specification from the W3C and prints
* it to a temporary file.
*/
public class Streams {
 
  // an identity copy stylesheet
  private static final String IDENTITY_XSLT =
    "<xsl:stylesheet xmlns:xsl='http://www.w3.org/1999/XSL/Transform'"
    + " version='1.0'>"
    + "<xsl:template match='/'><xsl:copy-of select='.'/>"
    + "</xsl:template></xsl:stylesheet>";
 
// the XML spec in XML format
  // (using an HTTP URL rather than a file URL)
  private static String xmlSystemId =
      "http://www.w3.org/TR/2000/REC-xml-20001006.xml";
 
  public static void main(String[] args) throws IOException,
      TransformerException {
 
    // show how to read from a system identifier and a Reader
    Source xmlSource = new StreamSource(xmlSystemId);
    Source xsltSource = new StreamSource(
        new StringReader(IDENTITY_XSLT));
 
    // send the result to a file
    File resultFile = File.createTempFile("Streams", ".xml");
    Result result = new StreamResult(resultFile);
 
    System.out.println("Results will go to: "
+ resultFile.getAbsolutePath( ));
 
// get the factory
TransformerFactory transFact = TransformerFactory.newInstance( );
 
// get a transformer for this particular stylesheet Transformer trans = transFact.newTransformer(xsltSource);
 
// do the transformation trans.transform(xmlSource, result);
}
}


The "identity copy" stylesheet simply matches "/", which is the document itself. It then uses <xsl:copy-of select='.'/> to select the document and copy it to the result tree. In this case, we coded our own stylesheet. You can also omit the XSLT stylesheet altogether as follows:

// construct a Transformer 
// without any XSLT stylesheet
Transformer trans = transFact.newTransformer(  );

In this case, the processor will provide its own stylesheet and do the same thing that our example does. This is useful when you need to use JAXP to convert a DOM tree to XML text for debugging purposes because the default Transformer will simply copy the XML data without any transformation.

JAXP DOM I/O

In many cases, the fastest form of transformation available is to feed an instance of org.w3c.dom.Document directly into JAXP. Although the transformation is fast, it does take time to generate the DOM; DOM is also memory intensive, and may not be the best choice for large documents. In most cases, the DOM data will be generated dynamically as the result of a database query or some other operation (see Chapter 1). Once the DOM is generated, simply wrap the Document object in a DOMSource as follows:

org.w3c.dom.Document domDoc = createDomDocument( );Source xmlSource = new javax.xml.transform.dom.DOMSource(domDoc);

The remainder of the transformation looks identical to the file-based transformation shown in Example 5-4. JAXP needs only the alternate input Source object shown here to read from DOM.

JAXP SAX I/O

XSLT is designed to transform well-formed XML data into another format, typically HTML. But wouldn't it be nice if we could also use XSLT stylesheets to transform nonXML data into HTML? For example, most spreadsheets have the ability to export their data into Comma Separated Values (CSV) format, as shown here:

Burke,Eric,M
Burke,Jennifer,L
Burke,Aidan,G

One approach is parsing the file into memory, using DOM to create an XML representation of the data, and then feeding that information into JAXP for transformation. This approach works but requires an intermediate programming step to convert the CSV file into a DOM tree. A better option is to write a custom SAX parser, feeding its output directly into JAXP. This avoids the overhead of constructing the DOM tree, offering better memory utilization and performance.

Pages: 1, 2, 3, 4, 5, 6, 7, 8, 9

Next Pagearrow