ONJava.com -- The Independent Source for Enterprise Java
oreilly.comSafari Books Online.Conferences.

advertisement

AddThis Social Bookmark Button

Using DOM to Traverse XML
Pages: 1, 2, 3

Traversing the Elements of the XML Document

All of the following code uses the Apache Software Foundation's implementation of DOM Level 2 Traversal-Range Recommendation, part of the Xerces project that provides implementation of



  • DOM Level 1 Core Recommendation,
  • DOM Level 2 Core Recommendation,
  • DOM Level 2 Events Recommendation, and
  • DOM Level 2 Traversal-Range Recommendation.

The Traversal-Range Recommendation defines two interfaces for traversing XML elements. The NodeIterator interface declares methods for traversing a flat representation of an XML document; and the TreeWalker interface declares methods that allow programmers to traverse an XML document as if it were a tree structure. Another interface defined in the Traversal-Range Recommendation is NodeFilter which determines what nodes should be included in the logical view of the document. It filters the elements of the XML document.

The create methods for NodeIterator and TreeWalker have flags that allow the programmer to choose what elements to include in the logical view of the document. Any node that is invisible is skipped over as if it does not exist in the document.

Writing a Filter
Filters are used to determine what elements should be incorporated into the logical view of the document. The following is the interface of NodeFilter.

// Introduced in DOM Level 2:
interface NodeFilter {

  // Constants returned by acceptNode
  const short FILTER_ACCEPT  = 1;
  const short FILTER_REJECT  = 2;
  const short FILTER_SKIP    = 3;


  // Constants for whatToShow
  const unsigned long  SHOW_ALL                       = 0xFFFFFFFF;
  const unsigned long  SHOW_ELEMENT                   = 0x00000001;
  const unsigned long  SHOW_ATTRIBUTE                 = 0x00000002;
  const unsigned long  SHOW_TEXT                      = 0x00000004;
  const unsigned long  SHOW_CDATA_SECTION             = 0x00000008;
  const unsigned long  SHOW_ENTITY_REFERENCE          = 0x00000010;
  const unsigned long  SHOW_ENTITY                    = 0x00000020;
  const unsigned long  SHOW_PROCESSING_INSTRUCTION    = 0x00000040;
  const unsigned long  SHOW_COMMENT                   = 0x00000080;
  const unsigned long  SHOW_DOCUMENT                  = 0x00000100;
  const unsigned long  SHOW_DOCUMENT_TYPE             = 0x00000200;
  const unsigned long  SHOW_DOCUMENT_FRAGMENT         = 0x00000400;
  const unsigned long  SHOW_NOTATION                  = 0x00000800;

  short  acceptNode(in Node n);
};

The NodeFilter interface contains one method, acceptNode(), that determines if a node should be accepted or rejected. The acceptNode() method may return one of three values:

  • FILTER_ACCEPT -- The current node is included into the logical view of the document.
  • FILTER_SKIP -- The current node is not accepted, but the children of the current node are considered for acceptance.
  • FILTER_REJECT -- The current node is not accepted and the children of the current node are not considered for inclusion.

The constants defined in the NodeFilter for the whatToShow will be used in the create methods of the NodeIterator and the TreeWalker. We will see this shortly. But first let's write a filter. To do so you simply write a class that implements the NodeFilter interface and overrides acceptNode(). Listing 3 is a simple filter that accepts all element nodes.

Listing 3: A filter that accepts all element nodes and skips others.


class AllElements implements NodeFilter
{
  public short acceptNode (Node n)
  {
    if (n.getNodeType() == Node.ELEMENT_NODE)
      return FILTER_ACCEPT;
    return FILTER_SKIP;
  }
}

The Node interface defines the following fields to help determine the type of node you're dealing with.


ATTRIBUTE_NODE

The node is an Attr.

CDATA_SECTION_NODE

The node is a CDATASection

COMMENT_NODE

The node is a Comment.

DOCUMENT_FRAGMENT_NODE

The node is a DocumentFragment

DOCUMENT_NODE

The node is a Document.

DOCUMENT_TYPE_NODE

The node is a DocumentType.

ELEMENT_NODE

The node is an Element.

ENTITY_NODE

The node is an Entity.

ENTITY_REFERENCE_NODE

The node is an EntityReference.

NOTATION_NODE

The node is a Notation.

PROCESSING_INSTRUCTION_NODE

The node is a ProcessingInstruction.

TEXT_NODE

The node is a Text node.

You can use a switch statement in your filter to determine what nodes to accept, skip, or reject. Listing 4 shows an example of a filter that uses a switch statement.

Listing 4: MyFilter.java -- A filter that uses a switch statement to determine what nodes to accept, skip, or reject.

class MyFilter implements NodeFilter
{
  public short acceptNode(Node n)
  {
    short s = n.getNodeType();
    switch (s) {
     case Node.ATTRIBUTE_NODE:
       return FILTER_REJECT;
     case Node.CDATA_SECTION_NODE:
       return FILTER_SKIP;
     case Node.COMMENT_NODE:
       return FILTER_ACCEPT;
     case Node.ELEMENT_NODE:
       return FILTER_ACCEPT;
     default:
       return FILTER_SKIP;
    }
  }
}

The NodeIterator interface
The NodeInterator interface provides methods to traverse a flat, or horizontal, representation of an XML document. For example, take a very simple XML document:

<A>
  <B>text1</B>
  <C>
    <D>child of C</D>
    <E>another child of C</E>
  </C>
  <F>moreText</F>
</A>

A flat representation of this simple XML document is

A B C D E F

When you have a flattened XML representation, you traverse it by asking for the next node and the previous node.

What is the horizontal version of the bank.xml document?

bank client clientName homeAddress homePhone account type accountNumber account type accountNumber client clientName homeAddress homePhone account type accountNumber employee empID empName workAddress workPhone salary employee empID workAddress workPhone salary branchID

The traverse an XML document you need to create an object that implements the Document interface. The Apache Software Foundation created an implementation class for the Document interface, DocumentImpl. In order to get an object of type DocumentImpl, you need to get a DOMParser and then parse the XML file you want to traverse. The DOMParser has a method, getDocument(), that you can use to get an object of type DocumentImpl. The following code creates the DocumentImpl object:

DOMParser parser = new DOMParser();
parser.parse("bank.xml"); DocumentImpl document =
(DocumentImpl)parser.getDocument();

Now that you have an object of type DocumentImpl you can create the implementation class for the NodeIterator, NodeIteratorImpl. The DocumentImpl object has a createNodeIterator() method that has the following signature:

NodeIterator createNodeIterator(rootNode, whatToShow, filter, boolean);

The rootNode is the at which you want to begin traversing. whatToShow is the option of the NodeFilter interface; what elements you want to include in the logical view of the XML document. The last parameter is a boolean which determines if you want the entity nodes to be expanded.

You can cast the return from this create method to retrieve a NodeIteratorImpl object as the following code shows.

NodeIteratorImpl iterator = 
	(NodeIteratorImpl) document.createNodeIterator(root,
		NodeFilter.SHOW_ELEMENT, (NodeFilter)allelements, true);

The NodeInterfaceImpl class provides the following methods to traverse the XML:

Node nextNode() raises(DOMException);
Node previousNode() raises(DOMException);
void detach();

Listing 5 shows a Java program that horizontally traverses an XML document.

Listing 5: NodeIterator.java - A Java application that horizontally traverses an XML document, returning the element nodes.

import org.w3c.dom.Node;
import org.apache.xerces.parsers.DOMParser;
import org.apache.xerces.dom.traversal.NodeIteratorImpl;
import org.apache.xerces.dom.DocumentImpl;
import org.apache.xerces.domx.traversal.NodeFilter;

public class NodeIteratorClient
{
  public static void main(String args[])
  {
    if ((args == null) || (args.length < 1))
    {
      System.out.println("Usage: java demo <XML_filename>");
      System.exit(0);
    }

    try
    {
      //create an object of the Document implementation class
      DOMParser parser = new DOMParser();
      parser.parse(args[0]);
      DocumentImpl document = (DocumentImpl)parser.getDocument();
      
      //get the root of the XML document
      Node root = document.getLastChild();

      //instantiate a filter
      AllElements allelements = new AllElements();
		
      //create an object of the NodeIterator implementation class
      NodeIteratorImpl iterator =
		(NodeIteratorImpl)document.createNodeIterator(root,
            NodeFilter.SHOW_ELEMENT, (NodeFilter)allelements, true);

      
      //recursively print all elements of the XML document
      printElements(iterator);

    }
    catch (Exception e)
    {
      System.out.println("error: " + e);
      e.printStackTrace();
      System.exit(0);
    }

  }

  //recursive function that prints all elements of the XML document
  public static void printElements(NodeIteratorImpl iter)
  {
    Node n;
    while ((n = iter.nextNode()) != null)
    {
      System.out.println(n.getNodeName());
    }
  }

}

//filters elements in the XML document; Returns all the Element nodes
class AllElements implements NodeFilter
{
  public short acceptNode (Node n)
  {
    if (n.getNodeType() == Node.ELEMENT_NODE)
      return FILTER_ACCEPT;
    return FILTER_SKIP;
  }
}

After running this program on the bank.xml file, the output looks as expected.

Output results using bank.xml.

Pages: 1, 2, 3

Next Pagearrow