ONJava.com    
 Published on ONJava.com (http://www.onjava.com/)
 See this if you're having trouble printing code examples


XML to PDF? Oh, FOP It.

by Vikram Goyal
10/16/2002

Formatting Objects Processor (FOP) is an open source Java API that can convert your XML data into reports in PDF format, as well as such other relevant formats as TXT, SVG, AWT, MIF, and PS. The software is developed under the Apache XML project and is free to use.

This article shows your how to get started with FOP. The primary advantage of FOP is its ability to convert XML data into reports in the PDF format, using a formatting tree. Most of the examples we'll cover will concentrate on this particular conversion, but we will also cover converting XML data to the Java AWT format.

This article is aimed at developers who are comfortable with XML and XSLT. For more information on XML head over to XML.com.

Setup

FOP can be downloaded from the FOP distribution directory. It is available in a bundle as a .gzip file in two distributions. The fop-0.20.4-src distribution contains the source code, so that you can do a build yourself using Ant. The fop-0.20.4-bin distribution contains only the binary distribution, without the source code and the Javadocs.

Extract the source distribution into a directory of your choice. The extraction will create a main directory named fop-0.20.4 and subdirectories build, conf, docs, hyph, lib, and src.

FOP Basics


Figure 1. FOP Architecture.

Related Reading

XSL-FO
Making XML Look Good in Print
By Dave Pawson

FOP is a tool that understands formatting objects as specified by the World Wide Web Consortium in the XSL specification. The first part of this specification deals with XSLT transformations. We are interested in the second part, which deals with what we call formatting objects (FO). This part of the spec defines output-independent formatting objects, which compose a vocabulary for style and layout of a document. For example, one of the formatting objects is fo:simple-page-master, which specifies a page template and its relevant properties (margins, headers, etc.). This way, tools like FOP can read this information and render it to the desired output (PDF/TXT). The main point is that the same styling information can be used to produce different outputs.

An FO document is simply an XML document. Its namespace is defined at the W3C Web site. It may contain any of the elements from this namespace. You can manually create this document and specify exact values for each and every element that should be in the output. The more common approach, however, is to write an XSLT stylesheet to take your XML data file, transform it according to your stylesheet rules, and produce the final FO document. Dynamically-generated data can be combined with an existing stylesheet to produce the FO document.

Although the main idea of FOP is to work on the FO document, it can take over the task of transforming the existing data (XML) using a stylesheet. Let's say you have your business data in XML format and stylesheet information in the form of an XSL file. If you supply these two to FOP, FOP will convert this information to a temporary FO document and render it to your desired output.

A Simple Example

Example Files

Download the example files for this article. This .zip file contains the following:
krusty.fo, krusty.pdf, krusty.xml, and krusty.xsl.

Enough with the theory. Lets get our hands dirty by running FOP. Open a command window and navigate to the directory where you installed FOP. The root FOP directory contains two executables: a shell script for Unix systems and a batch file for Windows, which enables running FOP from the command line. Based on your system, execute the relevant script. FOP will complain that no input was specified and gives you some example usage scenarios. Good -- this means that you can now start playing with it.

Let us start by creating a simple FO file. If you want to look at the end result, look at krusty.fo.

As I said earlier, a FO file is simply a well-formed XML file. So open up your text editor and the first line in it will be:

<?xml version="1.0" encoding="utf-8"?>

All FO files must have the outermost element as root.

<fo:root xmlns:fo="http://www.w3.org/1999/XSL/Format">. 

The root element is followed by the <fo:layout-master-set>, which specifies the layout of the pages in our document.

<fo:layout-master-set>
<fo:simple-page-master master-name="simple"
page-height="29.7cm" 
page-width="21cm"
margin-top="1cm" 
margin-bottom="2cm" 
margin-left="2.5cm" 
margin-right="2.5cm">
<fo:region-body margin-top="3cm"/>
<fo:region-before extent="3cm"/>
<fo:region-after extent="1.5cm"/>
</fo:simple-page-master>
</fo:layout-master-set>

As you can see, the master layout set contains definitions of different, simple page layouts on which content can be placed. In our case, we have defined a single simple page master where the attributes tell us that simple page master, the name of which we have given as simple (this is the name that will be used to reference it), has a page height of 29.7 cm, page width of 21 cm, top margin of 1 cm, and so on. We can define as many simple page masters as we want and give them different names to reference them later.

Now that we have defined how our pages will look like in terms of alignment and size, we need to define the actual content holders for our content. This is where we use the <fo:page-sequence> tag.

<fo:page-sequence master-reference="simple">

Notice that while defining the page sequence, we reference it to the simple page master, called simple, that we have defined earlier. This means that our content will be in a page constrained by the simple page master boundaries. The actual content can now be placed in the <fo:flow> element. The <fo:block> element, within the <fo:flow> element, starts a paragraph and defines the properties of each paragraph. So for our heading "Krusty the Clown," we want a sans serif font, the background color of blue, and the text center-aligned. Similarly, for the next block, we want the font size to be 12 pt. and the text alignment to be justified.

<fo:flow flow-name="xsl-region-body">

<fo:block font-size="18pt" 
font-family="sans-serif" 
line-height="24pt"
space-after.optimum="15pt"
background-color="blue"
color="white"
text-align="center"
padding-top="3pt">
Krusty the Clown
</fo:block>

<fo:block font-size="12pt" 
font-family="sans-serif" 
line-height="15pt"
space-after.optimum="3pt"
text-align="justify">
This memo explains why Krusty the Clown is our best customer. We need to take 
good care of him from now onwards and make sure that there are always enough 
bananas for his pet monkey.
</fo:block>

</fo:flow>

Finally, close all opened tags and save the file as krusty.fo in FOP root directory.

It's time to see the FOP magic. In the FOP root directory, type the following command:

fop krusty.fo krusty.pdf

FOP will run and transform our krusty.fo file into a krusty.pdf document in the same folder. Open it by double-clicking on it and check that final outcome is exactly the way we wanted it. Play with the FO file and make changes to it and see how it affects the outcome. Start with changing the text (our content), and then try changing the style, the margins, the color, the font, etc., and see how it all changes.

XSL + XML

By playing with the simple example, you would have noticed that it's very hard to produce the FO document by hand. It is cumbersome to change and by directly modifying the FO file, we are losing the benefit of content independence. So normally, you would instead use an XSLT stylesheet to transform your XML data into an XSL-FO file. You don't need to do this transformation explicitly or external to FOP. You can specify the stylesheet and the XML file to FOP and FOP can do the transformation by itself. Let's see an example of how to do this.

We'll abstract our data from the previous example into an XML file. So our XML data file looks like this:

<?xml version="1.0" encoding="UTF-8"?>
<data>
<name>
Krusty the Clown
</name>
<description>
This memo explains why Krusty the Clown is our best customer.
We need to take good care of him from now onwards and make sure
that there are always enough bananas for his pet monkey. 
</description>
</data>

Save this file in the root FOP directory.

We now need to produce a stylesheet that will be used to transform this data file into an FO file. To look at the end result, download the final XSL file (also in the FOP.zip example file).

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:fo="http://www.w3.org/1999/XSL/Format">

As expected, the XSL document starts with the XML declaration, followed by the namespace declarations.

<xsl:template match="/">

If you have worked with XSLT before, you will notice that now all we are trying to do is to match the tags that we expect in our XML file and replace/use them to transform into another XML file (our FO file). So the line above looks for and matches the root tag and replaces it with the content that follows it. This content, as outlined below, is basically just the definition of our simple layout master set from our FO file described above.

<fo:root xmlns:fo="http://www.w3.org/1999/XSL/Format">
<fo:layout-master-set>
<fo:simple-page-master master-name="simple"
page-height="29.7cm" 
page-width="21cm"
margin-top="1cm" 
margin-bottom="2cm" 
margin-left="2.5cm" 
margin-right="2.5cm">
<fo:region-body margin-top="3cm"/>
<fo:region-before extent="3cm"/>
<fo:region-after extent="1.5cm"/>
</fo:simple-page-master>
</fo:layout-master-set>

<fo:page-sequence master-reference="simple">
<fo:flow flow-name="xsl-region-body">
<xsl:apply-templates select="data"/>
</fo:flow>
</fo:page-sequence>

</fo:root> 

Replacing the data tags with the actual formatting information forms the next part of our XSL file.

<xsl:template match="data">
<fo:block>
<xsl:apply-templates select="name"/>
<xsl:apply-templates select="description"/>
</fo:block>
</xsl:template>

<xsl:template match="name"> 
<fo:block font-size="18pt" 
font-family="sans-serif" 
line-height="24pt"
space-after.optimum="15pt"
background-color="blue"
color="white"
text-align="center"
padding-top="3pt">
<xsl:value-of select="."/>
</fo:block> 
</xsl:template>

<xsl:template match="description">
<fo:block font-size="12pt" 
font-family="sans-serif" 
line-height="15pt"
space-after.optimum="3pt"
text-align="justify">
<xsl:value-of select="."/>
</fo:block>
</xsl:template>

</xsl:stylesheet>

As you can see, the template matches are replaced by the corresponding formatting information, in terms of FO tags. The stylesheet is complete with the closing stylesheet tag. Save this file as krusty.xsl in the root FOP directory.

To run this and see how our efforts compare with the original, type in the following command in the root FOP directory:

fop -xml krusty.xml -xsl krusty.xsl -pdf krusty.pdf

Thus, you are specifying the input XSL and XML files along with the output PDF. When you open this PDF, you will see that the result is exactly the same as when we ran FOP with the FO file.

The advantage of the second approach should be clear. Our input XML data can be different in different circumstances. Today we are preparing the report for Krusty the Clown, tomorrow it might be Bart the Kid. Our data file will change, but not the stylesheet.

Conversion to AWT

There really is nothing great to do when deciding to output to different formats. FOP takes care of it internally. So to output to Java's Abstract Window Toolkit (AWT), all you need to do is:

fop -xml krusty.xml -xsl krusty.xsl -awt

That is, specify the output as AWT rather than PDF. FOP creates an AWT viewer for you, as shown in Figure 2.


Figure 2. AWT Viewer.

Embedding FOP

Whether as part of a Web or a desktop application, the steps to embed FOP into your own application are simple.

  1. Instantiate org.apache.fop.apps.Driver.

    Driver driver = new Driver();
  2. Set the type of rendering you want to do.

    driver.setRenderer(Driver.RENDER_PDF);
  3. Set a logger to log to.

    driver.setLogger(log);
  4. Set an input source.

    driver.setInputSource(new FileInputSource(file));
  5. Set an output stream to render to.

    driver.setOutputStream(new FileOutputStream(out));
  6. Finally, you are set to produce the output.

    driver.run();

The above process is valid if you are specifying an input source as an FO file. If you specify XSL and XML files instead, you would change steps 4 and 6 as follows:

4. Set an Input Handler as an XSLTInputHandler.

InputHandler inputHandler = new XSLTInputHandler(xmlfile, xslfile);

6. Grab the parser out of this handler and render.

driver.render(inputHandler.getParser(), inputHandler.getInputSource());

That's all there is to embedding FOP into an application. There is a complete example in the docs\examples\embedding directory that deals with embedding FOP in a servlet.

Resources

Additional XML Resources

Conclusion

FOP is an open source tool for processing formatting objects. It renders these objects onto different media as per our requests. You can run FOP standalone, and as part of your application.

Vikram Goyal is the author of Pro Java ME MMAPI.


Return to ONJava.com.

Copyright © 2009 O'Reilly Media, Inc.