ONJava.com    
 Published on ONJava.com (http://www.onjava.com/)
 See this if you're having trouble printing code examples


Dynamic Creation of Reports with Apache Formatting Objects

by Kevin Hartig
12/11/2002

Business workflows require reports to be dynamically generated with current data, potentially from multiple sources. Report content, style and output format are typically user-defined and can be dependent on the requester of the report. Many excellent reporting tools are available today that provide report-generation capabilities but are either costly, use proprietary formats, or don't provide much flexibility to customize the content, style, or final output format. This article describes the architecture, design, and implementation of a reporting tool framework that uses XML standards and tools. The implementation demonstrates how reports are dynamically created using XML, XSL, XSLT, Java, and the Apache XML Formatting Objects Processor (FOP).

Requirements

Actors

Actors are entities that represent external users or systems. They can also represent subsystems within an architecture. The actors defined here are relevant to the reporting framework and are used in the defined use case.

Requester

The report requester is the client requesting the report. This actor is also typically the receiver of the final generated report. The Requester can be a browser, a subsystem, or another service (e.g., a scheduling service that makes regularly scheduled requests for reports).

Accompanying Files

All of the code associated with this article is available for free and should be included with the distribution. The reporting.zip archive file includes source code, compiled class files, libraries used, documentation, and this article.

Report Content Database

The report content database contains the information that will be used to compile the report content. There may be one or more data sources in the content database for any given report that is generated.

Report Creation Subsystem

The report creation subsystem is the subsystem that performs the function of generating the report. It reads data from various report content databases as necessary to acquire data needed to generate the report. It is also responsible for creating the report in a number of different styles and output formats.

Reporting System

The reporting system is the front end to the whole report-generation system. It manages communications, requests, and responses with other system clients. It delivers the completed report to a designated location. An authentication service is accessed by this subsystem to determine if the requester has access to the reporting system.

Authentication Service

The authentication service determines whether or not a requester has access rights to use the reporting system. Requests to use the reporting system must first be authenticated.

Related Reading

The XML CD Bookshelf
By O'Reilly Media, Inc.

Authorization Service

The authorization service determines if the requester has access to content in the report being generated. The report creation subsystem uses the authorization service to determine if various content should be filtered from the report. Authorization is determined based on privileges defined for a requester.

Use Case

A simple use case is defined to describe a basic functional scenario for the creation of a report. It describes the basic flow of events from the perspective of a client requesting a report. This definition defines a generalized report-generation request that can be used as a base for future projects requiring report-generation capabilities. Specific report types, formats, content, security, etc. can be defined for individual projects using the flow of this use case as an example.

Client requests for reports of various types are made with the expectation that a report can be generated either immediately on request and delivered as soon as possible, or on request for access at some later time from a report repository or cache. The client specifies the type of report type desired and report format to be delivered (e.g. PDF, RTF, etc.). Reports can be requested from a thin client (usually a Web browser) or other clients including services, schedulers, batch jobs, etc. Report content may be assembled from one or more data sources.

All requests for reports to be generated are authenticated, validating that the requester has access to the system. Checks are made to authorize content to be included in the report, based on access rights defined for the requester.

Flow of Events

1. Requester selects:

2. Request is sent to report-generation system.

3. Reporting system receives request. Requests are authenticated based on requester for access to the system.

4. The type of request is determined and is forwarded on to the report-creation subsystem.

5. Report-creation subsystem creates the report in the desired format using the report content database (may be multiple sources) for the report content. Report content creation is based on:

6. The completed report is cached for future reference.

7. The reporting system delivers the report back to the requester, or notification of completion is sent, if desired.

Supplementary Requirements

Supplementary requirements address what become the systemic qualities of the system. They quantify the desired quality of service in an operational environment. For the purposes of this framework these requirements are lightweight; nonetheless, they are important to consider up front, rather than trying to jam them in late in development.

1. Scalability to add new data sources. Support to incorporate data from a variety of sources must be included. This allows adding new data sources as needed, without changing the architecture.

2. A security interface for authorizing access to data to be included in reports must be included.

3. The framework should be able to function on its own so it can be accessed in a J2EE environment (from servlets) or as a Web service.

Architecture: Tiers and Layers

The reporting system framework is described as a multi-tiered architecture. The architecture is categorized into tiers, layers, and systemic qualities. Tiers represent the logical or physical organization of components into an ordered flow of provider and consumer services from client to back-end resources. Layers represent the hardware and software stacks that support components providing services within a given tier. Systemic qualities represent the tools, techniques, and best practices that deliver the requisite quality of service across the tiers and layers of the architecture.

The client component consists of a browser that can initiate requests for reports using HTTP and can receive reports in either HTML or another format (such as PDF), which are then redirected to the appropriate viewer. Reports can also be generated via a batch job defined to run periodically or on a one-time basis. Batch jobs can be considered clients of the reporting framework. There is a test client and servlet supplied as part of the code that can be used to exercise the implemented framework.

Middle-tier components provide the core application framework structure. If the framework is to be accessed by servlets, then J2EE components are included in the middle tiers. If the framework is to be accessed as a Web service, then Web service components are included in the middle tiers. In addition to the reporting framework, Java libraries for XML, XSL, XSLT, JDBC support, and Formatting Objects Processing that are used to support the framework are included in the middle tier.

The Resource tier represents components used to populate reports with content. Data sources are typically a vendor-specific database implementation accessed through JDBC or stored procedures. Additionally, other third-party reporting tool libraries and components could be used to augment data access when creating report content.


Figure 1.Tiers and Layers.

Logical Architecture View

The high-level decomposition of the architecture can be represented in terms of its package hierarchy and layers. Decomposition is shown using packages and their relation to the various tiers in the architecture. Packages are later further broken down into class representation to describe the relationships and sequence flow in the design.


Figure 2. Top Level Packages

Requester

The Requester package represents the client components requesting reports.

Authentication

All clients requesting reports must be authenticated before requests are passed on to the reporting system. This package represents the authentication mechanism. Clients requesting reports should be checked for some kind of valid authentication.

Authorization

The reporting system architecture should be able to utilize different possible authorization mechanisms. It will likely include a central repository for security policy definitions that are accessed by the mechanism and a tool to administer the content. The framework provides a standard way for authorization security policies to be accessed by the reporting framework. Resource access privileges are defined for users accessing reports. Resources' definitions will typically correspond to a particular report type and data content components that make up the report.

Report Creation

Report creation defines the operations needed to produce the report content and formatting. One or more data sources may be required to access data used as content in a report. XML parsers are applied to report templates, and handlers of the XML tags determine when data content is required. Data is retrieved from content databases and produce valid XML representing the data to be included in the report. The authorization mechanism is called from this package to determine if the requester is authorized to have the data. This package also coordinates the function of generating the report. It manages starting the threads of execution that merge data content into report templates, apply styles to the content, and render the data into final formats.

Reporting System

The reporting system is the front end to the whole system. It manages communications, requests, and responses with clients. It delivers the completed report to a designated location. Reports are delivered to requesting clients waiting for the response to the report request, or the report can be streamed to a file in a designated location. Asynchronous requests usually require some kind of notification that report generation has been completed. The method of notification needs to be defined by the requester. We assume that a notification mechanism is in place for use by the report generation system (e.g., email, paging, JMS).

To access the reporting system, the following interfaces are defined:

Application Layer Structure Across Tiers


Figure 3. Application Layer

The application layer represents the highest level of logic in the system. Most of the reporting system framework resides in this layer. The structure of the framework is broken down across the tiers in the following way to depict responsibilities of system components.

Architecture Patterns

Destructive Design

During development of report generation solutions, various reporting tools and libraries may be selected for use to reduce development time and associated costs. Limiting the impact of changing or adding additional components, and supporting the different configurations that result, can be problematic in this context. Decomposing the system as much as possible into parts that can be removed as easily as they can be added or replaced alleviates some of the problem. Reducing the number of steps to remove or add parts reduces the number of potential errors and facilitates easier reconfiguration. This is the intent of the report-creation component, which defines a static interface whose underlying implementation can be handled by a variety of software product components.

Limited Dependency

Dependency among parts is the primary determinant of system complexity. Complexity exponentially correlates to increased difficulty in system maintenance, evolution, verifiability, and robustness. Too much dependency defeats the benefits achieved by decomposition. Using well-defined interfaces passing a limited set of parameters limits subsystem dependencies.

Encapsulation

Encapsulation involves combining data and behavior behind an interface that exposes only abstracted behavior. Implementation of modules with crossdependencies or duplicate responsibilities increases system complexity and hinders evolution. Encapsulation of the reporting support tools' capabilities within the report content subsystem reduces system complexity.

Façade

The reporting system acts as a façade, decoupling the underlying report generation mechanisms completely from the client. It encapsulates the other report generation subsystems and defines an explicit interface defining available capabilities.

Design Description

Request Sequence

Figure 4 shows the basic sequence of events that take place when a report is requested. It is a high-level representation using packages defined in the architecture definition.

Click for larger view
Figure 4. Package Level Sequence Definition for Report Request. (You can click on the screen shot to open a full-size view.)

The main responsibilities and interfaces for the packages used in the sequence are as follows:

Reporting System

Responsibilities

  1. Handle report requests
  2. Initiate report content creation
  3. Initiate report style application
  4. Initiate report output format rendering
  5. Return report to Requester
  6. Persist report to temporary or permanent storage
  7. Send notification of report completion to Requester

Interfaces

To access the reporting system, the following interfaces are defined:

Report Creation

Responsibilities

  1. Get the report template for the requested report type
  2. Authorize request for report and authorize each section of data content
  3. Create data producers for each section defined in the template
  4. Assemble report data content using data returned by data producers
  5. Using FOP libraries, apply style to the report content
  6. Using FOP libraries, convert report to final output format

Interfaces

Authorization

Responsibilities

Interfaces

Data Producer

Responsibilities

  1. Access data sources containing data to be included in a report.
  2. Convert raw data requests to well-formed XML syntax that is used in the report.
  3. Return XML back to report creation subsystem.

Interfaces

ReportContentDB

Responsibilities

  1. Maintain the data used in reports.
  2. Provide access to the report data.

Interfaces

Separating Content, Style, and Output Format

The content of the report, the style applied to a report, and the rendered format are all distinct and separate operations. The same content can potentially have different styles applied to it and the styled report can be converted to a number of different formats (e.g., PDF or PostScript).

The first step in creating a report is to access and merge the dynamic content. Data content for a report is defined using XML and a custom defined set of tags and attributes that are relevant to the specific report type. A report template is used to define the type of content that will populate the report and data producers provide the dynamic content.

The second step in report creation is to apply a style to the report content generated in the first step. One or more stylesheets may be defined for a report. The requester of the reporting system may choose which style to apply. Stylesheets use the XSL:FO recommendations from the W3C. The XSL definitions are applied to the XML data using FOP classes. FOP includes a Driver class that starts the translation process, given an XML and XSL file.

InputHandler inputHandler = new XSLTInputHandler(XMLFile, XSLFile);
XMLReader parser = inputHandler.getParser();

The XSLTInputHandler reads in the data and style information, performs the transformation, and streams the new data to an XMLReader.

The data is then ready to be rendered by the FOP tools by defining the desired output format and the OutputStream to use. Using the Apache Formatting Object Processor makes these operations very simple.

driver.setRenderer(outputFormat);
driver.setOutputStream(output);
driver.render(parser, inputHandler.getInputSource());

In the implementation that goes along with this article, the XML data file is generated and saved as a file. The file is useful for validating that the data is correct and for applying various styles during development. A production system would most likely stream the XML data content directly to the XSL transformation processing, eliminating the time it takes to write the data to a file.

FOP provides a number of different renderers as part of the distribution. Additional renderers are in the works or can be added to the libraries as explained on the xml.apache.org FOP Web site.

Accessing Data Content

Data producers access data from data sources and convert the data into well-formed XML representing the report data content. In the creation of a report, one or more data producers will access specific types of data from one or more data sources. For simplicity, it is best to make each data producer type correspond to only one data source.

Data producers are created by the report-creation subsystem while parsing a report template. Report templates define the type of data producer needed to fill in the data content. This framework is designed to read through a report template using a SAX parser and to look for a <reporting> tag. Each <reporting> tag has an attribute that defines the class name of the data producer to use. The sample code includes a simple report template called customerNumbers.xml. It defines one data producer class to be created.

<?xml version="1.0" encoding="UTF-8"?>
<addresses>
   <reporting producer="com.company.reporting.dataproducers.CustomerDataProducer">
   </reporting>
</addresses>

ReportCreation creates a TemplateHandler, which extends the SAX handler. It handles callbacks as the XML is read by an XML parser. When tags are parsed, the startElement method is called. If a <reporting tag is found, a DataProducerFactory is created and a data producer of the type defined in the tag attribute is requested to be created by the factory. This newly-created data producer is used to create the data content by calling the process method. The handler streams the resulting data content on to an output stream.


Figure 5. Class used to created data content with DataProducers.

From the included example code and database, the resulting XML with the data content looks like the following:

<?xml version="1.0" encoding="UTF-8"?>
<addresses>
   <customer>
      <customername>SuperCom</customername>
      <customerphone>305-777-4632</customerphone>
   </customer>
   <customer>
      <customername>LivingstonEnterprises</customername>
      <customerphone>305-456-8888</customerphone>
   </customer>
   <customer>
      <customername>Oak Computers</customername>
      <customerphone>214-999-1234</customerphone>
   </customer>
   <customer>
      <customername>MicroApple</customername>
      <customerphone>555-275-9900</customerphone>
   </customer>
   <customer>
      <customername>HostProCom</customername>
      <customerphone>650-456-8876</customerphone>
   </customer>
</addresses>

This is only the data content for the report to be generated. Styles can now be independently applied to the data.

Notification

Notification implementation used for this example is Java Mail. The reporting system uses Java Mail to email messages about the completion status of a report if the asynchronous request interface is used. A properties file defines the required values for mail notification. A notification message is defined based on the completion status of report generation and sent to the specified user.

Putting it All Together

The code included with this article implements the design described. The code represents a working framework to be used for dynamic report generation. It can be extended and evolve as needed. All source code, libraries used, compiled class files, and sample database are included. Executing the code can be done with a test client or from a servlet running in a Web server environment.

Test Client

There is a com.company.reporting.Tester class defined that can be run as a unit tester. This class takes the name of a properties file as an input argument. There is a reporting.properties file included. The properties file defines the location of template file, style sheet, and output directories. It also contains the properties used for email notification properties. The tester is run from the source directory using:

java -cp .;../lib/activation.jar;../lib/avalon-frameworkcvs20020315.jar;
           ../lib/fop.jar;../lib/mail.jar;../lib/pbclient35RE.jar;
           ../lib/xalan-2.3.1.jar;../lib/xercesImpl-2.0.1.jar 
           com.company.reporting.Tester reporting.properties 

Related Reading

Apache: The Definitive Guide
By Ben Laurie, Peter Laurie

Servlet

The example also includes a servlet that calls the reporting service. The ReportRequester.html file provides a form allowing the selection of multiple styles and output formats to be applied to a report. The report can be returned synchronously to the requesting client browser or stored as a file. The sample code class files and .jar files need to be deployed to the Web server's WEB-INF directory and configured according to Web server specifications.

Database

The sample uses a sample PointBase 3.5 database for data. If you wish to use the sample database, the demo database server must be downloaded and installed. The pbclient35RE.jar archive includes all of the client driver code needed to run the sample. Alternatively, you could use another database and associated software and write your own data producers to connect and access data content.

Summary

With the help of some excellent, freely-available open source software libraries and a few sleepless nights of development, it was possible to develop a framework that supports creating reports dynamically from database content. This framework also provides a mechanism to apply different document styles and output formats. The basic requirements that defined what the framework is supposed to perform, an overall architecture, design details, and an implementation of the framework are provided. The result is a basis for a reporting solution using some of today's defined standards and available open source tools. The architecture is meant to be extensible so security features can be added, multiple data sources can be incorporated, and new output formats can be included. There is a bit of a learning curve to understand all of the technologies used, but the working implementation provided shortens the time to deploy a working system.

Resources

Additional XML Resources

Kevin Hartig is a senior Java architect at Sun's Professional Services Java Center.


Return to ONJava.com.

Copyright © 2009 O'Reilly Media, Inc.