As of November 13, 2000, DOM Level 2 became an official W3C (World Wide Web Consortium) recommendation. Besides remedying a few known problems with DOM Level 1, the biggest enhancement was the addition of support for the Namespaces in the XML recommendation. This article serves as a quick, general introduction to the DOM (Document Object Model) as well as a guide to the new Level 2 features for those who are already familiar with DOM Level 1. The latest version of the DOM Core recommendation can be found on the W3C Web site.
If you're already familiar with the DOM, you might want to skip
ahead to the Namespace Support section.
The DOM defines a set of standard object interfaces that an XML parser
can use to expose the contents of a document to a client application.
These interfaces provide access to all of the information from the
original document, organized in a hierarchical tree structure. The
base interface for navigating this tree structure is the Node
interface. Every specific document structure is represented in the
DOM by one of the following specialized interfaces:
DocumentAttrElementTextCommentCDATASectionDocumentTypeNotationEntityEntityReferenceProcessingInstructionThese specialized interfaces all inherit the basic attributes and
methods provided by the Node interface. They also provide
specialized access to the unique information associated with each
specific XML document item (e.g., the ProcessingInstruction
interface has target and data attributes). The
resulting specialized nodes are stored in a "list of lists"
structure that has parent-child and sibling-to-sibling links. For
instance, the following document:
<?xml version="1.0"?>
<?my_app data="my data"?>
<coordinate>
<latitude>38.9</latitude>
<longitude>-77.03</longitude>
</coordinate>
would yield the following tree of DOM nodes in memory:
This structure can be traversed using the parent, child, and sibling
links available through the Node interface.
The most significant new features introduced in DOM Level 2 are
to support the Namespaces in XML recommendation, which was approved
on January 14, 1999. For an excellent introduction to namespaces, see
O'Reilly editor Simon St.Laurent's
Introduction to Namespaces presentation. Essentially,
namespaces are used to allow multiple XML vocabularies to be mixed in
a single document. For instance, you might include a list of books
that had a <title> element in an XHTML page, which
specifies a <title> element as well. Without
namespaces, it would be impossible to differentiate between the two
types of <title> elements within the document. With
namespaces, each element tag is associated with a namespace
URI (Uniform Resource Identifier) using a prefix:
<prefix:localpart>
|
Related Reading
XML in a Nutshell |
The prefix is a sort of shorthand that takes the place of the URI
in the tag name. This saves a lot of typing, and most URIs are not
valid XML name tokens. Before a prefix can be used, it is declared
using the special xmlns attribute prefix. For
example:
<tagname xmlns:svg="http://www.w3.org/2000/svg">
After the xmlns:svg attribute is parsed, the svg:
prefix can be associated with tags and attributes. DOM Level 1 didn't
include any namespace support, so it would be up to the application
using DOM to figure out the significance of the special, prefixed tag
names. DOM Level 2 supports namespaces by providing new namespace-aware
versions of Level 1 methods:
| Interface | Basic Version | Namespace-Aware Version |
Document |
createElement() |
createElementNS() |
Document |
createAttribute() |
createAttributeNS() |
Document |
getElementsByTagName() |
getElementsByTagNameNS() |
NamedNodeMap |
getNamedItem() |
getNamedItemNS() |
NamedNodeMap |
setNamedItem() |
setNamedItemNS() |
NamedNodeMap |
removeNamedItem() |
removeNamedItemNS() |
Element |
getAttribute() |
getAttributeNS() |
Element |
setAttribute() |
setAttributeNS() |
Element |
removeAttribute() |
removeAttributeNS() |
Element |
getAttributeNode() |
getAttributeNodeNS() |
Element |
setAttributeNode() |
setAttributeNodeNS() |
Element |
getElementsByTagName() |
getElementsByTagNameNS() |
Element |
hasAttribute() |
hasAttributeNS() |
Each of the new namespace methods expands the basic version of
the method by adding a namespaceURI parameter. Instead of
simply passing in a tag name, it is now possible to pass in a
qualified name (a name that includes a namespace prefix) and the
associated URI. The DOM transparently takes care of inserting the
xmlns attributes as needed.
Scott Means has also written another XML article for oreilly.com, Converting Unstructured Documents to XML. If you've ever wondered how to convert all of your legacy documents into XML, this hands-on article covers the details of XML conversion using real-life examples.
All of the methods that deal with locating attributes and elements
also have namespace-aware versions, which enable an application to
find only those tags and attributes that match the XML application in
question. Take the earlier example, where an XHTML document
included a list of books that have <title> elements.
Using the namespace-aware getElementsByTagNameNS() method of
the Element interface, it is possible to find the proper
<title> tag by including the namespace URI:
Element elXHTMLTitle = document.
getElementsByTagNameNS("http://www.w3.org/1999/xhtml", "title");
The same process can be used to find attributes that belong to a particular application.
One of the major limitations of the DOM Level 1 Core was the
inability to programmatically create new, empty Document
instances. The only way to get an instance of a Document
object was to parse an existing document, making it awkward to
implement certain types of applications. The DOMImplementation
interface has two new methods: createDocument() and
createDocumentType(). These can be used to create a
new, empty document and associated DTD (Document Type Definition) that can be
manipulated through
the Document interface. Unfortunately, there is still no
programmatic support for modifying the DTD through the
DocumentType interface, so implementing a full-featured
document editor using only DOM calls is not possible.
One of the fundamental concepts of the DOM API is that no part of
a document tree (a single Node-derived object instance) can
exist outside of its parent document. Although it is possible to have
multiple documents open at the same time, it is illegal to
programmatically remove a node from one tree and insert it into
another open document. Level 2 provides the new (and somewhat
inaccurately named) importNode() method of the
Document interface. Instead of importing, what it actually
does is copy the designated portion of the source document and make
it available for use within the target document.
Besides the namespaces support and enhanced support for editing
documents, there were several changes that were made to fix obvious
problems with the Level 1 version (such as the inability to
determine what element a particular Attr node belongs to).
The DOM still has a long way to go, and as the popularity of XML
increases, the demand for more features will only accelerate. But
based on its popularity, and the large number of free implementations
that are available to incorporate in new applications, DOM will be
an important part of every XML programmer's toolkit for years to
come.
W. Scott Means has been a professional software developer since 1988, when he joined Microsoft Corporation at the age of 17. He was one of the original developers of OS/2 1.1 and Windows NT, and did some of the early work on the Microsoft Network for the Advanced Technology and Business Development group. Most recently, he is serving as the CEO of Enterprise Web Machines, a new Internet infrastructure venture based in Columbia, South Carolina. Scott can be reached at smeans@enterprisewebmachines.com.
|
Related Reading XML in a Nutshell |
Copyright © 2007 O'Reilly Media, Inc.