oreilly.comSafari Books Online.Conferences.


Processing XML with Xerces and the DOM
Pages: 1, 2, 3, 4

Character Conversion

That was fine as a first try at using Xerces, but step1 still leaves room for improvement. (The code is certainly demo quality, but that's another issue.) The primary headache was the memory management: XMLString::transcode()'s dynamic memory allocation leaves a lot of cleanup work for the developer. I'd be surprised if the sample code doesn't risk a memory leak somewhere.

Two helper classes demonstrated in step2 use C++'s Resource Initialization Is Acquisition (RIIA) idiom to ease the pain: they take ownership of loose pointers allocated by transcode(), and release() those strings in their destructors. Because C++ promises to call the destructor as the object goes out of scope, without developer intervention, it's useful for such fire-and-forget tactics.

The first helper object is StringManager. Its convert() method calls transcode() behind the scenes to convert between char* and XMLChar* strings:

StringManager sm ;
const XMLCh* someTag = sm.convert( "someTag" ) ;

When sm goes out of scope, its destructor calls XMLString::release() on all of the strings it created in convert(). StringManager is convenient when a block of code requires several loose string conversions.

The second helper class is DualString. It takes ownership of a single string, and lets code address the same logical character sequence as either C or Xerces character types:

// constructor is overloaded to accept const XMLCh* as well
DualString TAG_CONFIG( "config" ) ;
someXercesFunction( TAG_CONFIG.asXMLString() ) ;
someCFunction( TAG_CONFIG.asCString() ) ;

Giving credit where it's due, I didn't create DualString. I based it on the StrX class used in some of the Xerces sample code.

You can pass DualString directly to an output stream, so it's nice to use for one-offs such as printing the message from a Xerces exception:

}catch( xercesc::XMLException& e ){

  std::cerr << "XML toolkit teardown error: "
        << DualString( e.getMessage() )
        << std::endl

Of course, be sure to instantiate neither DualString nor StringManager as a pointer; because you must manually and explicitly invoke pointer destructors via delete(), that would defeat the automatic nature of the RIIA technique.

You've probably noticed that step2 is a little more compact than step1--it replaces the awkward transcode()-temporary-release() dance with DualString and StringManager--but the real benefit is the lack of explicit memory management. These helper classes let you focus on using Xerces in your app, rather than pointer wrangling.

Some XMLString functions take a custom MemoryManager object that handles allocation. As an alternative to StringManager and DualString, you could create a memory manager and pass it to all XMLString calls. Either way works; I just find the simple helper classes more convenient.

Another Helper Class: Finding Elements

step1 is also a little rough in its search for elements: it walks the tree and chooses a path of action based on the name of the current element. This code is already messy, and in a complex document it would be much worse.

step3 moves element-finding logic into a separate helper class called ElementFinder. Instead of walking the tree, code can call ElementFinder functions. For example,


fetches the top-level <config> element.

This version of ElementFinder uses the same tree-walking logic as before; but the move to a separate class isn't about meaningless code shuffling. Hiding this behavior in another class makes it easy to switch from the ElementFinder code's brute-force method to XPath. (The Xerces-C++ FAQ suggests using Xalan or Pathan for true XPath searches.)

Don't be fooled--the Xerces API docs include classes for limited XPath features (DOMXPathEvaluator and DOMXPathExpression), but they're just placeholders. These classes' methods throw exceptions when called.

Pages: 1, 2, 3, 4

Next Pagearrow

Sponsored by: