ONLamp.com
oreilly.comSafari Books Online.Conferences.

advertisement


Exploring E4X with Ruby
Pages: 1, 2

XML Writing Made Easy

I really liked E4X's model of putting XML inline into the code. I couldn't push Ruby that far, because the language already defines the less-than and greater-than symbols. I settled on:

doc += xml <<XMLEND

<account id="bar">
   <transaction amount="100" />
   <transaction amount="200" />
</account>
XMLEND

This will add a chunk of XML to the document using the plus operator. The new xml keyword is really just a function:

def xml( xmldata ) NodeWrapper.new( REXML::Document.new( xmldata ).root ); end

It creates a new REXML document, and then wraps its root node in a NodeWrapper to make it easy to access. To make the plus happen, I had to add some methods to NodeWrapper:

class NodeWrapper
   def method_missing( name, *args )
       name = name.to_s
       if ( name =~ /^_/ )
           name.gsub!( /^_/, "" )
           if ( name =~ /=$/ )
               name.gsub!( /=$/, "" )
               _write_attribute( name, args[0] )
           else
               _read_attribute( name )
           end
       else
           xpath( name )
       end
   end

   def initialize( node )
       @node = node
   end

   def to_s() @node.to_s; end

   def to_i() @node.to_s.to_i; end

   def _add ( nodes )
       @node << nodes._get_node
       self
   end

   alias :<< :_add

   alias :+ :_add

   def _get_node() @node; end

def xpath( name )
       children = NodeListWrapper.new()
       REXML::XPath.each( @node, name ) { |elem|
           children.push( NodeWrapper.new( elem ) )
       }
       children
   end

private

   def _read_attribute( name )
       @node.attributes[ name ].to_s
   end

   def _write_attribute( name, value )
       @node.attributes[ name ] = value
   end

end

I broke out method_missing to make it a little clearer about what it does. I also aliased << and + to the _add method. This method in turn uses REXML's << method on a node to add a set of nodes from one tree into another.

Now, to test the upgraded NodeWrapper class, I will add a new account into the tree after reading the file:

out = {}
doc = readxml( 'test_data.xml' )
doc += xml <<XMLEND
<account id="bar">
   <transaction amount="100" />
   <transaction amount="200" />
</account>
XMLEND

doc.account.each { |account|
   amount = 0
   account.transaction.each { |item| amount += item._amount.to_i }
   out[ account._id ] = amount
}

p out

The xml method creates the new tree, and the plus operator handles adding it into the document. I also added a new readxml function, which takes a path name and returns a wrapper to the root node of the XML object.

One last step is to integrate XPath to make things even easier.

Adding XPath Support

Our new NodeWrapper supports an XPath method that returns a node list of wrappers:

   def xpath( name )
       children = NodeListWrapper.new()
       REXML::XPath.each( @node, name ) { |elem|
           children.push( NodeWrapper.new( elem ) )
       }
       children
   end

XPath allows you to specify a set of nodes in an XML document in a way similar to specifying files in an operating system. As paths are to a file system, an XPath is a path within an XML document. Every node and attribute in any tree has a unique XPath.

XPath also supports wildcards and will return a set of nodes that match. For example, this code:

total = 0
doc.xpath( "account[@id='a']//@amount" ).each { |amount| total += amount.to_i }
print "#{total}\n"

returns the total for just the account with the id value of a. This code:

total = 0
doc.xpath( "//@amount" ).each { |amount| total += amount.to_i }
print "#{total}\n"

returns the amount sum for the entire document, regardless of account.

This just barely scratches the surface of the power of XPath. It's important to have easy access to XPath features in any XML API.

Caveats

This article was an experiment in creating an E4X-style API by using the power of the Ruby language. It doesn't cover the entire standard, but it does provide some perspective both on the value of E4X and on the flexibility of scripting languages. Perl and Python both provide the equivalent of the missing method system shown here, so it's possible to do something similar to this in either of those languages.

With statically typed languages, such C++ or Java, you will run into the problem that the nodes and attributes are not defined at compile time. One alternative solution is to use a code generator to build classes from an XML schema definition that will provide dot-notation syntax for read and write access. Unfortunately, the code will be specific to one particular XML schema. For run-time flexibility, you will need to use the DOM or SAX method of reading and writing.

Finally, there is very little published information about E4X so far. This article relies on what I could glean from the hour-long presentation I attended. If I have made some mistakes in the E4X syntax, I apologize. I'm pretty sure I nailed the highlights, even if the specifics may vary somewhat.

Conclusion

I've written plenty of articles recently using Java as the language. In comparison, writing the code for this article was a blast. One of the great things about Ruby is that it makes writing code really fun because it works the way we think.

Writing code and working with computers should be fun. I hope this article gives some reasons to try and simplify XML access to make it fun for everyone. If that helps Ruby out a little bit in the process, so much the better.

Jack Herrington is an engineer, author and presenter who lives and works in the Bay Area. His mission is to expose his fellow engineers to new technologies. That covers a broad spectrum, from demonstrating programs that write other programs in the book Code Generation in Action. Providing techniques for building customer centered web sites in PHP Hacks. All the way writing a how-to on audio blogging called Podcasting Hacks.


Return to ONLamp.com



Sponsored by: