O'Reilly    
 Published on O'Reilly (http://www.oreilly.com/)
 http://xml.oreilly.com/news/xmlnut_0201.html
 See this if you're having trouble printing code examples


What's New in the DOM Level 2 Core?

by W. Scott Means
02/11/2001

As of November 13, 2000, DOM Level 2 became an official W3C (World Wide Web Consortium) recommendation. Besides remedying a few known problems with DOM Level 1, the biggest enhancement was the addition of support for the Namespaces in the XML recommendation. This article serves as a quick, general introduction to the DOM (Document Object Model) as well as a guide to the new Level 2 features for those who are already familiar with DOM Level 1. The latest version of the DOM Core recommendation can be found on the W3C Web site.

Introduction to the DOM

If you're already familiar with the DOM, you might want to skip ahead to the Namespace Support section. The DOM defines a set of standard object interfaces that an XML parser can use to expose the contents of a document to a client application. These interfaces provide access to all of the information from the original document, organized in a hierarchical tree structure. The base interface for navigating this tree structure is the Node interface. Every specific document structure is represented in the DOM by one of the following specialized interfaces:

These specialized interfaces all inherit the basic attributes and methods provided by the Node interface. They also provide specialized access to the unique information associated with each specific XML document item (e.g., the ProcessingInstruction interface has target and data attributes). The resulting specialized nodes are stored in a "list of lists" structure that has parent-child and sibling-to-sibling links. For instance, the following document:


<?xml version="1.0"?>
  <?my_app data="my data"?>
  <coordinate>
    <latitude>38.9</latitude>
    <longitude>-77.03</longitude>
  </coordinate>

would yield the following tree of DOM nodes in memory:

DOM nodes

This structure can be traversed using the parent, child, and sibling links available through the Node interface.

Namespace Support

The most significant new features introduced in DOM Level 2 are to support the Namespaces in XML recommendation, which was approved on January 14, 1999. For an excellent introduction to namespaces, see O'Reilly editor Simon St.Laurent's Introduction to Namespaces presentation. Essentially, namespaces are used to allow multiple XML vocabularies to be mixed in a single document. For instance, you might include a list of books that had a <title> element  in an XHTML page, which specifies a <title> element as well. Without namespaces, it would be impossible to differentiate between the two types of <title> elements within the document. With namespaces, each element tag is associated with a namespace URI (Uniform Resource Identifier) using a prefix:

<prefix:localpart>

Related Reading

XML in a Nutshell

XML in a Nutshell
A Desktop Quick Reference
By Elliotte Rusty Harold, W. Scott Means

Table of Contents
Index
Sample Chapter

Read Online--Safari Search this book on Safari:
 

Code Fragments only

The prefix is a sort of shorthand that takes the place of the URI in the tag name. This saves a lot of typing, and most URIs are not valid XML name tokens. Before a prefix can be used, it is declared using the special xmlns attribute prefix.  For example:

<tagname xmlns:svg="http://www.w3.org/2000/svg">

After the xmlns:svg attribute is parsed, the svg: prefix can be associated with tags and attributes. DOM Level 1 didn't include any namespace support, so it would be up to the application using DOM to figure out the significance of the special, prefixed tag names. DOM Level 2 supports namespaces by providing new namespace-aware versions of Level 1 methods:

Interface Basic Version Namespace-Aware Version
Document createElement() createElementNS()
Document createAttribute() createAttributeNS()
Document getElementsByTagName() getElementsByTagNameNS()
NamedNodeMap getNamedItem() getNamedItemNS()
NamedNodeMap setNamedItem() setNamedItemNS()
NamedNodeMap removeNamedItem() removeNamedItemNS()
Element getAttribute() getAttributeNS()
Element setAttribute() setAttributeNS()
Element removeAttribute() removeAttributeNS()
Element getAttributeNode() getAttributeNodeNS()
Element setAttributeNode() setAttributeNodeNS()
Element getElementsByTagName() getElementsByTagNameNS()
Element hasAttribute() hasAttributeNS()

Each of the new namespace methods expands the basic version of the method by adding a namespaceURI parameter. Instead of simply passing in a tag name, it is now possible to pass in a qualified name (a name that includes a namespace prefix) and the associated URI. The DOM transparently takes care of inserting the xmlns attributes as needed.


Scott Means has also written another XML article for oreilly.com, Converting Unstructured Documents to XML. If you've ever wondered how to convert all of your legacy documents into XML, this hands-on article covers the details of XML conversion using real-life examples.

Searching

All of the methods that deal with locating attributes and elements also have namespace-aware versions, which enable an application to find only those tags and attributes that match the XML application in question.  Take the earlier example, where an XHTML document included a list of books that have <title> elements. Using the namespace-aware getElementsByTagNameNS() method of the Element interface, it is possible to find the proper <title> tag by including the namespace URI:

Element elXHTMLTitle = document.
   getElementsByTagNameNS("http://www.w3.org/1999/xhtml", "title");

The same process can be used to find attributes that belong to a particular application.

Making Changes

One of the major limitations of the DOM Level 1 Core was the inability to programmatically create new, empty Document instances. The only way to get an instance of a Document object was to parse an existing document, making it awkward to implement certain types of applications. The DOMImplementation interface has two new methods: createDocument() and createDocumentType().  These can be used to create a new, empty document and associated DTD (Document Type Definition) that can be manipulated through the Document interface. Unfortunately, there is still no programmatic support for modifying the DTD through the DocumentType interface, so implementing a full-featured document editor using only DOM calls is not possible.

Working with Multiple Documents

One of the fundamental concepts of the DOM API is that no part of a document tree (a single Node-derived object instance) can exist outside of its parent document. Although it is possible to have multiple documents open at the same time, it is illegal to programmatically remove a node from one tree and insert it into another open document. Level 2 provides the new (and somewhat inaccurately named) importNode() method of the Document interface. Instead of importing, what it actually does is copy the designated portion of the source document and make it available for use within the target document.

Conclusion

Besides the namespaces support and enhanced support for editing documents, there were several changes that were made to fix obvious problems with the Level 1 version (such as the inability to determine what element a particular Attr node belongs to). The DOM still has a long way to go, and as the popularity of XML increases, the demand for more features will only accelerate. But based on its popularity, and the large number of free implementations that are available to incorporate in new applications, DOM will be an important part of every XML programmer's toolkit for years to come.


W. Scott Means has been a professional software developer since 1988, when he joined Microsoft Corporation at the age of 17. He was one of the original developers of OS/2 1.1 and Windows NT, and did some of the early work on the Microsoft Network for the Advanced Technology and Business Development group. Most recently, he is serving as the CEO of Enterprise Web Machines, a new Internet infrastructure venture based in Columbia, South Carolina. Scott can be reached at smeans@enterprisewebmachines.com.

XML in a Nutshell

Related Reading

XML in a Nutshell
A Desktop Quick Reference
By Elliotte Rusty Harold, W. Scott Means

Table of Contents
Index
Sample Chapter

Read Online--Safari
Search this book on Safari:
 

Code Fragments only

Copyright © 2007 O'Reilly Media, Inc.