ONJava.com    
 Published on ONJava.com (http://www.onjava.com/)
 See this if you're having trouble printing code examples


Using DOM to Traverse XML

by Stephanie Fesler
02/08/2001

Introduction

XML is now used frequently to model business data in large-scale applications. A common Java application task is to parse XML to retrieve its data. The Document Object Model (DOM) defines a set of interfaces for navigating and manipulating the content and structure of XML and HTML documents.

Objective

After reading this article you will be able to create a representation of an XML document in your Java program and traverse that representation in two different ways. You will be able to traverse a horizontal representation of the XML document, and you will be able to traverse a tree, or hierarchical, representation of the XML document.

The Document Object Model (DOM) in Detail

The DOM defines interfaces that allow programmers to navigate XML and HTML documents and also to manipulate their content and structure. The DOM is a specification of interfaces; it's not an implementation. Vendors are left to come up with their own implementation of DOM. Sun Microsystems has some DOM support in its Java XML Processing API. Other vendors that provide support are IBM, Oracle, and the Apache Software Foundation.

The DOM has several levels. Level 0 was the first requirements document that defined functionality similar to that used in Netscape Navigator 3.0 and Internet Explorer 3.0. On October 1, 1998, DOM Level 1 Recommendation was released which provided functionality to navigate and manipulate the structure and content of XML and HTML documents. DOM Level 2 is a set of specifications that add to the functionality defined in DOM Level 1. The following list describes the different recommendations of DOM Level 2.

Getting Started

This article explores the traversal of DOM representations of XML documents from within Java applications. The Apache Software Foundation has implemented the optional interfaces of the DOM Level 2 Traversal-Range Recommendation in their Xerces project. You can download the Xerces JAR, which contains the files you'll need. Xerces also supports the optional interfaces defined in DOM Level 2 Events Recommendation. Make sure to place the xerces.jar file in your system CLASSPATH so the Java compiler will be able to locate the appropriate files.

You also need a JDK to compile and run your Java programs. You can download a JDK from Sun Microsystems.

An Example XML Document

To learn to parse and traverse XML you'll need an example XML document. Listing 1 is the DTD for the example XML document. It's a simple representation of a bank. In this example a bank has clients, employees, and a branch identification number.


Listing 1: bank.dtd -- A DTD that defines the parts of a bank:

<!ELEMENT bank (client+, employee+, branchID)>
<!ELEMENT client (clientName, homeAddress, homePhone, account+)>
<!ELEMENT branchID (#PCDATA)>
<!ELEMENT clientName (#PCDATA)>
<!ELEMENT homeAddress (#PCDATA)>
<!ELEMENT homePhone (#PCDATA)>
<!ELEMENT account (type, accountNumber)>
<!ELEMENT type (#PCDATA)>
<!ELEMENT accountNumber (#PCDATA)>
<!ELEMENT employee (empID, empName, workAddress, workPhone, salary)>
<!ELEMENT empID (#PCDATA)>
<!ELEMENT empName (#PCDATA)>
<!ELEMENT workAddress (#PCDATA)>
<!ELEMENT workPhone (#PCDATA)>
<!ELEMENT salary (#PCDATA)>
<!ELEMENT branchID (#PCDATA)>

Listing 2, an instance of our DTD, represents a bank with two clients and two employees.

Listing 2: bank.xml -- A simple XML document representing a view of a bank.

<?xml version="1.0"?>
<!DOCTYPE bank SYSTEM "bank.dtd" >

<bank>
  <client>
    <clientName>Bill Clinton</clientName>
    <homeAddress>Nashua, NH</homeAddress>
    <homePhone>555/555-8975</homePhone>
    <account>
      <type>Checking</type>
      <accountNumber>111222333</accountNumber>
    </account>
    <account>
      <type>Savings</type>
      <accountNumber>777888999</accountNumber>
    </account>
  </client>

  <client>
    <clientName>Al Gore</clientName>
    <homeAddress>Washington, DC</homeAddress>
    <homePhone>555/555-4256</homePhone>
    <account>
      <type>Savings</type>
      <accountNumber>444777888</accountNumber>
    </account>
  </client>

  <employee>
    <empID>2105</empID>
    <empName>Ronald Reagan</empName>
    <workAddress>Nashua, NH</workAddress>
    <workPhone>555/555-1245</workPhone>
    <salary>60000</salary>
  </employee>

  <employee>
    <empID>77</empID>
    <empName>Jimmy Carter</empName>
    <workAddress>Denver, CO</workAddress>
    <workPhone>555/555-1235</workPhone>
    <salary>250000</salary>
  </employee>

  <branchID>78963</branchID>
</bank>

Traversing the Elements of the XML Document

All of the following code uses the Apache Software Foundation's implementation of DOM Level 2 Traversal-Range Recommendation, part of the Xerces project that provides implementation of

The Traversal-Range Recommendation defines two interfaces for traversing XML elements. The NodeIterator interface declares methods for traversing a flat representation of an XML document; and the TreeWalker interface declares methods that allow programmers to traverse an XML document as if it were a tree structure. Another interface defined in the Traversal-Range Recommendation is NodeFilter which determines what nodes should be included in the logical view of the document. It filters the elements of the XML document.

The create methods for NodeIterator and TreeWalker have flags that allow the programmer to choose what elements to include in the logical view of the document. Any node that is invisible is skipped over as if it does not exist in the document.

Writing a Filter
Filters are used to determine what elements should be incorporated into the logical view of the document. The following is the interface of NodeFilter.

// Introduced in DOM Level 2:
interface NodeFilter {

  // Constants returned by acceptNode
  const short FILTER_ACCEPT  = 1;
  const short FILTER_REJECT  = 2;
  const short FILTER_SKIP    = 3;


  // Constants for whatToShow
  const unsigned long  SHOW_ALL                       = 0xFFFFFFFF;
  const unsigned long  SHOW_ELEMENT                   = 0x00000001;
  const unsigned long  SHOW_ATTRIBUTE                 = 0x00000002;
  const unsigned long  SHOW_TEXT                      = 0x00000004;
  const unsigned long  SHOW_CDATA_SECTION             = 0x00000008;
  const unsigned long  SHOW_ENTITY_REFERENCE          = 0x00000010;
  const unsigned long  SHOW_ENTITY                    = 0x00000020;
  const unsigned long  SHOW_PROCESSING_INSTRUCTION    = 0x00000040;
  const unsigned long  SHOW_COMMENT                   = 0x00000080;
  const unsigned long  SHOW_DOCUMENT                  = 0x00000100;
  const unsigned long  SHOW_DOCUMENT_TYPE             = 0x00000200;
  const unsigned long  SHOW_DOCUMENT_FRAGMENT         = 0x00000400;
  const unsigned long  SHOW_NOTATION                  = 0x00000800;

  short  acceptNode(in Node n);
};

The NodeFilter interface contains one method, acceptNode(), that determines if a node should be accepted or rejected. The acceptNode() method may return one of three values:

The constants defined in the NodeFilter for the whatToShow will be used in the create methods of the NodeIterator and the TreeWalker. We will see this shortly. But first let's write a filter. To do so you simply write a class that implements the NodeFilter interface and overrides acceptNode(). Listing 3 is a simple filter that accepts all element nodes.

Listing 3: A filter that accepts all element nodes and skips others.


class AllElements implements NodeFilter
{
  public short acceptNode (Node n)
  {
    if (n.getNodeType() == Node.ELEMENT_NODE)
      return FILTER_ACCEPT;
    return FILTER_SKIP;
  }
}

The Node interface defines the following fields to help determine the type of node you're dealing with.


ATTRIBUTE_NODE

The node is an Attr.

CDATA_SECTION_NODE

The node is a CDATASection

COMMENT_NODE

The node is a Comment.

DOCUMENT_FRAGMENT_NODE

The node is a DocumentFragment

DOCUMENT_NODE

The node is a Document.

DOCUMENT_TYPE_NODE

The node is a DocumentType.

ELEMENT_NODE

The node is an Element.

ENTITY_NODE

The node is an Entity.

ENTITY_REFERENCE_NODE

The node is an EntityReference.

NOTATION_NODE

The node is a Notation.

PROCESSING_INSTRUCTION_NODE

The node is a ProcessingInstruction.

TEXT_NODE

The node is a Text node.

You can use a switch statement in your filter to determine what nodes to accept, skip, or reject. Listing 4 shows an example of a filter that uses a switch statement.

Listing 4: MyFilter.java -- A filter that uses a switch statement to determine what nodes to accept, skip, or reject.

class MyFilter implements NodeFilter
{
  public short acceptNode(Node n)
  {
    short s = n.getNodeType();
    switch (s) {
     case Node.ATTRIBUTE_NODE:
       return FILTER_REJECT;
     case Node.CDATA_SECTION_NODE:
       return FILTER_SKIP;
     case Node.COMMENT_NODE:
       return FILTER_ACCEPT;
     case Node.ELEMENT_NODE:
       return FILTER_ACCEPT;
     default:
       return FILTER_SKIP;
    }
  }
}

The NodeIterator interface
The NodeInterator interface provides methods to traverse a flat, or horizontal, representation of an XML document. For example, take a very simple XML document:

<A>
  <B>text1</B>
  <C>
    <D>child of C</D>
    <E>another child of C</E>
  </C>
  <F>moreText</F>
</A>

A flat representation of this simple XML document is

A B C D E F

When you have a flattened XML representation, you traverse it by asking for the next node and the previous node.

What is the horizontal version of the bank.xml document?

bank client clientName homeAddress homePhone account type accountNumber account type accountNumber client clientName homeAddress homePhone account type accountNumber employee empID empName workAddress workPhone salary employee empID workAddress workPhone salary branchID

The traverse an XML document you need to create an object that implements the Document interface. The Apache Software Foundation created an implementation class for the Document interface, DocumentImpl. In order to get an object of type DocumentImpl, you need to get a DOMParser and then parse the XML file you want to traverse. The DOMParser has a method, getDocument(), that you can use to get an object of type DocumentImpl. The following code creates the DocumentImpl object:

DOMParser parser = new DOMParser();
parser.parse("bank.xml"); DocumentImpl document =
(DocumentImpl)parser.getDocument();

Now that you have an object of type DocumentImpl you can create the implementation class for the NodeIterator, NodeIteratorImpl. The DocumentImpl object has a createNodeIterator() method that has the following signature:

NodeIterator createNodeIterator(rootNode, whatToShow, filter, boolean);

The rootNode is the at which you want to begin traversing. whatToShow is the option of the NodeFilter interface; what elements you want to include in the logical view of the XML document. The last parameter is a boolean which determines if you want the entity nodes to be expanded.

You can cast the return from this create method to retrieve a NodeIteratorImpl object as the following code shows.

NodeIteratorImpl iterator = 
	(NodeIteratorImpl) document.createNodeIterator(root,
		NodeFilter.SHOW_ELEMENT, (NodeFilter)allelements, true);

The NodeInterfaceImpl class provides the following methods to traverse the XML:

Node nextNode() raises(DOMException);
Node previousNode() raises(DOMException);
void detach();

Listing 5 shows a Java program that horizontally traverses an XML document.

Listing 5: NodeIterator.java - A Java application that horizontally traverses an XML document, returning the element nodes.

import org.w3c.dom.Node;
import org.apache.xerces.parsers.DOMParser;
import org.apache.xerces.dom.traversal.NodeIteratorImpl;
import org.apache.xerces.dom.DocumentImpl;
import org.apache.xerces.domx.traversal.NodeFilter;

public class NodeIteratorClient
{
  public static void main(String args[])
  {
    if ((args == null) || (args.length < 1))
    {
      System.out.println("Usage: java demo <XML_filename>");
      System.exit(0);
    }

    try
    {
      //create an object of the Document implementation class
      DOMParser parser = new DOMParser();
      parser.parse(args[0]);
      DocumentImpl document = (DocumentImpl)parser.getDocument();
      
      //get the root of the XML document
      Node root = document.getLastChild();

      //instantiate a filter
      AllElements allelements = new AllElements();
		
      //create an object of the NodeIterator implementation class
      NodeIteratorImpl iterator =
		(NodeIteratorImpl)document.createNodeIterator(root,
            NodeFilter.SHOW_ELEMENT, (NodeFilter)allelements, true);

      
      //recursively print all elements of the XML document
      printElements(iterator);

    }
    catch (Exception e)
    {
      System.out.println("error: " + e);
      e.printStackTrace();
      System.exit(0);
    }

  }

  //recursive function that prints all elements of the XML document
  public static void printElements(NodeIteratorImpl iter)
  {
    Node n;
    while ((n = iter.nextNode()) != null)
    {
      System.out.println(n.getNodeName());
    }
  }

}

//filters elements in the XML document; Returns all the Element nodes
class AllElements implements NodeFilter
{
  public short acceptNode (Node n)
  {
    if (n.getNodeType() == Node.ELEMENT_NODE)
      return FILTER_ACCEPT;
    return FILTER_SKIP;
  }
}

After running this program on the bank.xml file, the output looks as expected.

Output results using bank.xml.

The TreeWalker interface
You've now seen how to write filters and traverse XML documents horizontally. An XML document can also be represented as a tree structure, and the TreeWalker interface declares methods that allow you to traverse this tree structure. Take that simple XML document again:

<A>
  <B>text1</B>
  <C>
    <D>child of C</D>
    <E>another child of C</E>
  </C>
  <F>moreText</F>
</A>

The tree representation of this simple XML document is

The tree representation.

In the tree structure you have the concept of parent nodes and children nodes. Therefore, the TreeWalker interface provides the following methods to traverse the tree structure:

Node parentNode();
Node firstChild();
Node lastChild();
Node previousSibling();
Node nextSibling();
Node previousNode();
Node nextNode();

The create method for the TreeWalker is very similar to the create method for NodeIterator; the only differences is the name and the return value. The signature for TreeWalker's create method is

TreeWalker createTreeWalker(root, whatToShow, filter, boolean);

The return from this method can be cast into TreeWalkerImpl. You can then use the traversal methods to walk he XML document. Listing 6 shows a Java application using the TreeWalker implementation class to do just that.

Listing 6: TreeWalkerClient.java - A Java application that is traversing the tree representation of the XML document.

import org.w3c.dom.Node;
import org.apache.xerces.parsers.DOMParser;
import org.apache.xerces.dom.traversal.TreeWalkerImpl;
import org.apache.xerces.domx.traversal.NodeFilter;
import org.apache.xerces.dom.DocumentImpl;

public class TreeWalkerClient
{
  public static void main(String args[])
  {
    if ((args == null) || (args.length < 1))
    {
      System.out.println("Usage: java demo <filename>");
      System.exit(0);
    }

    try
    {
      //create an object of the Document implementation class
      DOMParser parser = new DOMParser();
      parser.parse(args[0]);
      DocumentImpl document = (DocumentImpl)parser.getDocument();

      //get the root of the XML document
      Node root = document.getLastChild();

      //instantiate the filter object
      AllElements allelements = new AllElements();

      //create an object of the TreeWalker implementation class
      TreeWalkerImpl tw =
         (TreeWalkerImpl)document.createTreeWalker(root,
            NodeFilter.SHOW_ELEMENT, (NodeFilter)allelements, true);

      //print the elements of the TreeWalker implementation class
      printElements(tw);


    } 
    catch (Exception e)
    {
      System.out.println("error: " + e);
      e.printStackTrace();
      System.exit(0);
    }

  }


  //traverses the tree structure representation
  public static void printElements(TreeWalkerImpl tw)
  {
    Node n = tw.getCurrentNode();
    System.out.println(n.getNodeName());
    for (Node child=tw.firstChild();child!=null;child=tw.nextSibling())
    {
      printElements(tw);
    }

    tw.setCurrentNode(n);
  }

}


//filters the elements of the XML document
class AllElements implements NodeFilter
{
  public short acceptNode (Node n)
  {
    if (n.getNodeType() == Node.ELEMENT_NODE)
      return FILTER_ACCEPT;
    return FILTER_SKIP;
  }
}

After running this program with the bank.xml file, the output is

Results of running on bank.xml.

Notice the output is the same as the NodeIteratorClient Java application. The only difference between these programs was how the XML document was traversed.

Summary

The Document Object Model (DOM) can be used within Java applications to traverse XML documents. The DOM only specifies the interfaces that can allow navigation and manipulation of XML documents, it's left to vendors to supply implementations that do the work. This article focused on Apache Software Foundation's implementation of the DOM Level 2 Traversal-Range Recommendation.

XML documents can be represented in one of two fashions: as a flat structure or as a tree structure. The NodeIterator interface declares the methods that allow programmers to traverse a flat representation of an XML document. The TreeWalker interface declares methods that allow programmers to traverse a tree representation of an XML document. The NodeFilter interface allows programmers to create filters that select which items from the XML document are included in the logical view of the application.

Stephanie Fesler is a BEA Systems expert on implementing various Java 2EE API.


Return to ONJava.com.

Copyright © 2009 O'Reilly Media, Inc.