ONJava.com    
 Published on ONJava.com (http://www.onjava.com/)
 See this if you're having trouble printing code examples


XML Document Validation with an XML Schema XML Document Validation with an XML Schema

by Deepak Vohra
09/15/2004

An XML schema defines the structure of the elements and attributes in an XML document. For an XML document to be valid based on an XML schema, the XML document has to be validated against the XML schema. This tutorial explains the procedure of validating an XML document with an XML schema.

In this article, the Xerces2-j and JAXP parsers are used to validate an XML document with an XML schema. In Xerces2-j, schema validation is integrated with the SAXParser and DOMParser parsers. In JAXP, DocumentBuilder classes are used to validate a XML document. XML schema validation is illustrated with an XML document comprising of a catalog. This article is structured into the following sections:

  1. Preliminary Setup
  2. Overview
  3. Validation of an XML Document with the Xerces2-j Parser
  4. Validation of an XML Document with the JAXP Parser

Preliminary Setup

To validate an XML document with the Xerces2-j parser, the Xerces2-j classes need to be in the classpath. The Xerces2-j parser may be obtained from the Xerces2-j page. Extract the Xerces-J-bin.2.5.0.zip (for Windows) or Xerces-J-bin.2.5.0.tar.gz (for Unix) files to the installation directory of your choice. Add <XERCES>/xerces-2_5_0/xercesImpl.jar and <XERCES>/xerces-2_5_0/xml-apis.jar to the classpath variable, where <XERCES>is the directory in which Xerces2-j is installed.

To validate a XML document with the JAXP parser, its DocumentBuilder classes need to be in the classpath. These are provided by the Java Web Services Developer Pack, which may be obtained from the JWSDP web site. Extract the Java Web Services Developer Pack 1.2 (jwsdp-1.2) application file to an installation directory. Add <JAXP>/jaxp/lib/jaxp-api.jar and <JAXP>/jaxp/lib/endorsed/xercesImpl.jar to the classpath variable, where <JAXP> is the directory in which you installed jwsdp-1.2.

Overview

In this tutorial, an example XML document named catalog.xml, consisting of an ONJava journal catalog, is used. The xmlns:xsi attribute, xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance", defines the XML namespace prefix, xsi. The xsi:noNamespaceSchemaLocation attribute, xsi:noNamespaceSchemaLocation="file://c:/Schemas/catalog.xsd", defines the schema for elements in the XML document without a namespace prefix. The example XML document is shown below:

<?xml version="1.0" encoding="UTF-8"?>
<!--A OnJava Journal Catalog--> 

<catalog 
   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"  
   xsi:noNamespaceSchemaLocation=
 "file://c:/Schemas/catalog.xsd" 
   title="OnJava.com" publisher="O'Reilly"> 
 <journal date="April 2004"> 
   <article>
    <title>Declarative Programming in Java</title>
    <author>Narayanan Jayaratchagan</author>
   </article>
 </journal>
 <journal date="January 2004">
   <article> 
    <title>Data Binding with XMLBeans</title>
    <author>Daniel Steinberg</author>
   </article>
 </journal>
</catalog>

The example XML document is validated with an example XML schema file, catalog.xsd. The elements in this schema document are in the XML schema namespace of http://www.w3.org/2001/XMLSchema. The catalog.xsd file looks like this:

<?xml version="1.0" encoding="utf-8"?>
<xs:schema 
xmlns:xs="http://www.w3.org/2001/XMLSchema"> 
  <xs:element name="catalog">
   <xs:complexType>
    <xs:sequence>
     <xs:element ref="journal" minOccurs="0" 
maxOccurs="unbounded"/>
    </xs:sequence>
    <xs:attribute name="title" type="xs:string"/>
    <xs:attribute name="publisher"  type="xs:string"/>
   </xs:complexType>
  </xs:element>
  <xs:element name="journal">
   <xs:complexType>
    <xs:sequence>
     <xs:element ref="article" minOccurs="0" 
        maxOccurs="unbounded"/>
    </xs:sequence>
    <xs:attribute name="date" type="xs:string"/>
   </xs:complexType>
  </xs:element>
  <xs:element name="article">
   <xs:complexType>
    <xs:sequence>
     <xs:element name="title" type="xs:string"/>
     <xs:element ref="author" minOccurs="0" 
        maxOccurs="unbounded"/>
    </xs:sequence>
   </xs:complexType>
  </xs:element>
  <xs:element name="author" type="xs:string"/>
</xs:schema>

In the following sections, we'll discuss validation of the example XML document, catalog.xml, with the example schema document, catalog.xsd.

Validation of an XML Document with the Xerces2-j Parser

Xerces2-j provides the DOMParser and the SAXParser for parsing an XML document. To use SAX parsing, import the SAXParser.

import org.apache.xerces.parsers.SAXParser;

A DefaultHandler is used as the ErrorHandler with the parser. An ErrorHandler registers the errors generated by the parser. Import the DefaultHandler class.

import org.xml.sax.helpers.DefaultHandler;

To validate with a SAXParser, create a SAXParser. The SAXParser class is a subclass of the XMLParser class.

SAXParser parser = new SAXParser();

Set the validation feature to true to report validation errors. If the validation feature is set to true, the XML document should specify a XML schema or a DTD.

parser.setFeature("http://xml.org/sax/features/validation",
                  true); 

Set the validation/schema feature to true to report validation errors against a schema.

parser.setFeature("http://apache.org/xml/features/validation/schema", 
                  true);

Set the validation/schema-full-checking feature to true to enable full schema, grammar-constraint checking.

parser.setFeature("http://apache.org/xml/features/validation/schema-full-checking",
                  true); 

Specify a validation schema for the parser with the schema/external-noNamespaceSchemaLocation or the schema/external-schemaLocation property. The schema/external-schemaLocation property is used to specify a schema with a namespace. A schema list may be specified with the schema/external-schemaLocation property. The schema/external-noNamespaceSchemaLocation property is used to specify a schema that does not have a namespace. A parser is not required to locate a schema specified with the schema/external-noNamespaceSchemaLocation and schema/external-schemaLocation properties. For our purposes, a schema without a namespace is used to validate an XML document.

parser.setProperty(
     "http://apache.org/xml/properties/schema/external-noNamespaceSchemaLocation",
     SchemaUrl);

Create a class that extends the DefaultHandler class.

private class Validator extends DefaultHandler {
  public boolean validationError = false;  
  public SAXParseException saxParseException = null; 
  public void error(SAXParseException exception)
      throws SAXException {
   validationError = true;
   saxParseException = exception;
  }     
  public void fatalError(SAXParseException exception)
      throws SAXException {
   validationError = true;	    
   saxParseException=exception;	     
  }		    
  public void warning(SAXParseException exception)
      throws SAXException { }	
}

The DefaultHandler class implements the ErrorHandler interface, and is used to specify an ErrorHandler for the Xerces parser. Instantiating the above Validator class allows us to register it as an ErrorHandler with the parser.

Validator handler = new Validator();
parser.setErrorHandler(handler);

Since Validator implements ErrorHandler, you can use it to parse the example XML document. The parse methods parse(java.lang.String systemId) and parse(org.xml.sax.InputSource inputSource) may be used for parsing an XML document.

parser.parse(XmlDocumentUrl);

The errors generated by the parser get registered with the ErrorHandler and are retrieved from the ErrorHandler. The example program, SchemaValidator.java (see Resources below), is used to validate the example XML document, catalog.xml, with the example XML schema, catalog.xsd.

String variables such as SchemaUrl and XmlDocumentUrl are specified as file URLs. For example:

SchemaUrl: file://c:/schema/catalog.xsd
XmlDocumentUrl: file://c:/catalog/catalog.xml.

Validation of an XML Document with the JAXP Parser

Another way to validate an XML document is with a JAXP DocumentBuilder. To begin, import the DocumentBuilderFactory and DocumentBuilder classes. The DocumentBuilder class is used to obtain a org.w3c.dom.Document document from an XML document, while the DocumentBuilderFactory class is used to obtain a DocumentBuilder parser.

import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.DocumentBuilder;

To validate with a DocumentBuilder parser, set the System property javax.xml.parsers.DocumentBuilderFactory:

System.setProperty("javax.xml.parsers.DocumentBuilderFactory",
               "org.apache.xerces.jaxp.DocumentBuilderFactoryImpl");

Next, you need to create a DocumentBuilderFactory.

DocumentBuilderFactory factory =
    DocumentBuilderFactory.newInstance();

An instance of DocumentBuilderFactory is found by applying the following rules and taking the first one that succeeds:

  1. Use the javax.xml.parsers.DocumentBuilderFactory system property.
  2. Use the properties file lib/jaxp.properties in the JRE directory.
  3. Use the META-INF/services/javax.xml.parsers.DocumentBuilderFactory file with the Services API.
  4. Use the Platform default DocumentBuilderFactory instance.

To parse a XML document with a namespace, set the setNamespaceAware() feature to true. By default, the setNamespaceAware() feature is set to false.

factory.setNamespaceAware(true);

Set the setValidating() feature of the DocumentBuilderFactory to true to make the parser a validating parser. By default, the setValidating() feature is set to false.

factory.setValidating(true);

Set the schemaLanguage and schemaSource attributes of the DocumentBuilderFactory. The schemaLanguage attribute specifies the schema language for validation. The schemaSource attribute specifies the XML schema document to be used for validation.

factory.setAttribute(
    "http://java.sun.com/xml/jaxp/properties/schemaLanguage",
    "http://www.w3.org/2001/XMLSchema");
factory.setAttribute(
    "http://java.sun.com/xml/jaxp/properties/schemaSource",
    SchemaUrl);

Create a DocumentBuilder parser.

DocumentBuilder builder = factory.newDocumentBuilder();

This returns a new DocumentBuilder, with the parameters configured in the DocumentBuilderFactory. Create and register an ErrorHandler with the parser.

Validator handler=new Validator();
builder.setErrorHandler(handler); 

Validator is a class that extends the DefaultHandler class. The DefaultHandler class implements the ErrorHandler interface. The Validator class is listed in the previous section. Parse the XML document with the DocumentBuilder parser. The different parse methods are parse(InputStream is), parse(File f), parse(InputSource is), parse(InputStream is,String systemId), and parse(String uri).

builder.parse(XmlDocumentUrl);

Validator, an ErrorHandler of the type DefaultHandler, registers errors generated by the validation. The example program, JAXPValidator.java (see Resources below), is used to validate the example XML document, catalog.xml, with the example XML schema, catalog.xsd, using the JAXP parser.

Conclusion

For an XML document to be based on an XML schema, the XML document is required to be validated with the schema. This tutorial explained the validation of an example XML document with an example XML schema with a Xerces2-j parser, and the JAXP DocumentBuilder parser.

Resources

Deepak Vohra is a NuBean consultant and a web developer.


Return to ONJava.com.

Copyright © 2009 O'Reilly Media, Inc.