ONLamp.com
oreilly.comSafari Books Online.Conferences.

advertisement


Extending Ruby with C
Pages: 1, 2, 3

GenX::Writer#begin_element

It has taken an awful lot of trouble to put the begin_document and end_document methods in place. Now users can start and end documents--but in order for this to be of much use, they'll want to put some content in the document. Because this is XML, and XML documents all start with a root element, that means the next methods to implement are GenX::Writer#begin_element and GenX::Writer#end_element. The obvious place to start is GenX::Writer#begin_element.

Genx::Writer#begin_element is a thin wrapper around the genxStartElementLiteral function. It's really similar to the methods already shown. Here's the implementation:

static VALUE
writer_begin_element (int argc, VALUE *argv, VALUE self)
{
  genxWriter w;
  VALUE xmlns, name;

  switch (argc)
    {
      case 1:
        xmlns = 0;
        name = argv[0];
        break;

      case 2:
        xmlns = argv[0];
        Check_Type (xmlns, T_STRING);
        name = argv[1];
        break;

      default:
        rb_raise (rb_eRuntimeError, "invalid arguments");
    }

  Check_Type (name, T_STRING);

  Data_Get_Struct (self, struct genxWriter_rec, w);

  GENX4R_ERR (genxStartElementLiteral
               (w,
                xmlns ? (constUtf8) RSTRING (xmlns)->ptr : NULL,
                (constUtf8) RSTRING (name)->ptr), w);

  return Qnil;
}

A few things here haven't appeared before. First of all, this method takes a variable number of arguments. Most of the code in this function goes to figuring out how many arguments it received and setting things up as appropriate. The way Ruby lets you do this at the C level is that the underlying C function takes as arguments an integer that holds the number of arguments passed, a pointer to an array of VALUEs that contains each of the arguments, and a VALUE that holds the invoking object.

If it receives one argument, it uses that as the name of the element.

If it receives two arguments, then the first is the namespace and the second is the name. Given an xmlns argument, the code verifies that it is a String using the Check_Type macro with the T_STRING constant. The same check occurs for the element's name. Then, as usual, it pulls the genxWriter out of self and finally calls the underlying genxStartElementLiteral function, passing in the namespace if provided and valid, and a NULL otherwise. When passing the namespace and element name, note that the code uses the RSTRING macro to cast the VALUE to the underlying string data structure before accessing the C-string pointer via the ptr field in that structure.

Once again, Init_genx4r needs more code to hook up this method:

rb_define_method (rb_cGenXWriter,
                  "begin_element",
                  writer_begin_element,
                  -1);

Notice the -1 that tells Ruby to call this method via the count/array/object style of argument passing.

Now that there's a way to start an element, there must be a way to end it. That's the purpose of the GenX::Writer#end_element method.

GenX::Writer#end_element

As you might have guessed, GenX::Writer#end_element is very similar to GenX::Writer#end_document. Here's the implementation:

static VALUE
writer_end_element (VALUE self)
{
  genxWriter w;

  Data_Get_Struct (self, struct genxWriter_rec, w);

  GENX4R_ERR (genxEndElement (w), w);

  return Qnil;
}

All it does is pull out the writer and call genxEndElement on it. GenX does the rest. As usual, it takes one call in Init_genx4r to hook up the method:

rb_define_method (rb_cGenXWriter,
                  "end_element",
                  writer_end_element,
                  0);

Now GenX4r can actually produce XML. Jump into irb and try it out.

$ irb
irb(main):001:0> require 'genx4r'
=> true
irb(main):002:0> w = GenX::Writer.new
=> #<GenX::Writer:0x321f0c>
irb(main):003:0> s = ''
=> ""
irb(main):004:0> w.begin_document(s)
=> nil
irb(main):005:0> w.begin_element("foo")
=> nil
irb(main):006:0> w.end_element       
=> nil
irb(main):007:0> w.end_document
=> nil
irb(main):008:0> s
=> "<foo></foo>"
irb(main):009:0>

There you have it! The extension actually produced some XML output! Of course, most XML needs some textual content within at least some of the tags. Making that work means implementing GenX::Writer#text, a wrapper around genxAddText.

GenX::Writer#text

After everything implemented so far, GenX::Writer#text doesn't have anything all that new to it. Take a look:

static VALUE
writer_text (VALUE self, VALUE text)
{
  genxWriter w;

  Check_Type (text, T_STRING);

  Data_Get_Struct (self, struct genxWriter_rec, w);

  GENX4R_ERR (genxAddText (w, (constUtf8) RSTRING (text)->ptr), w);

  return Qnil;
}

There are the usual hoops to access the genxWriter and then a call to pass the text through to genxAddText. Here's code to hook up the method in Init_genx4r.

rb_define_method (rb_cGenXWriter,
                  "text",
                  writer_text,
                  1);

There you have it, a functionally complete wrapper. Try it out in irb to prove it.

$ irb
irb(main):001:0> require 'genx4r'
=> true
irb(main):002:0> w = GenX::Writer.new
=> #<GenX::Writer:0x321f0c>
irb(main):003:0> s = ''
=> ""
irb(main):004:0> w.begin_document(s)
=> nil
irb(main):005:0> w.begin_element("foo")
=> nil
irb(main):006:0> w.text("bar")
=> nil
irb(main):007:0> w.end_element       
=> nil
irb(main):008:0> w.end_document
=> nil
irb(main):009:0> s
=> "<foo>bar</foo>"
irb(main):010:0>

With the combination of elements and text, you can now start using GenX4r for some nontrivial tasks. Before that, I'd like to write some tests to verify that everything works now and that it will continue to work as I make changes in the future.

Unit Testing

In Ruby, the accepted way to write unit tests is to use the Test::Unit framework. This is a standard unit test framework, written along the lines of the popular JUnit package. To use it, subclass the Test::Unit::TestCase class and implement your tests as methods that are named test_something (where the something part changes for each test). Inside the tests, use the assert method to indicate what conditions need to be true for the tests to pass. Here's a simple test case to start:

require 'test/unit'
require 'genx4r'

class BasicsTest < Test::Unit::TestCase
  def test_element
    w = GenX::Writer.new
    s = ''

    w.begin_document(s)
    w.begin_element('foo')
    w.text('bar')
    w.end_element
    w.end_document

    assert s == '<foo>bar</foo>'
  end
end

Run the tests by running that file. You should receive output similar to the following:

$ ruby test.rb
Loaded suite test
Started
.
Finished in 0.005774 seconds.

1 tests, 1 assertions, 0 failures, 0 errors

The line with the single dot on it is where you see the output for the tests. Each passing test prints a . whereas failing tests print an F. To add more tests, fill in more test methods. They will run automatically when you run the file.

Making Things a Bit More Ruby-esque

All right, now there's a working module and a test suite to make sure it keeps on working. I'm all set to release this new toy to the unsuspecting masses out there on the Internet, right? Not quite. Although the API works, it's not ideal. You have to remember to call the GenX::Writer#end_element and GenX::Writer#end_document methods at exactly the right times; otherwise you'll either mess up the output (if elements nest incorrectly) or even possibly throw an exception because you call underlying GenX functions out of order. Remember that GenX is big on enforcing correctness, so if you screw up, it will tell you about it.

It would be really nice to arrange for the module to call these end methods at the appropriate times. Fortunately, Ruby has a way to do that: blocks.

A block in Ruby is a chunk of code passed to a method as one of its arguments. The method can then call the yield method to invoke the block whenever it wants. The syntax looks like this:

def takes_a_block(&block)
  puts "before yield"
  yield
  puts "after yield"
end

takes_a_block do
  puts "in the block"
end

Running this code produces the following output:

$ ruby blocks-example.rb
before yield
in the block
after yield

Note that Ruby allows braces as block delimiters instead of do and end, in which case the method call could have looked like takes_a_block { puts "in the block" }. Both ways are valid. Which one you use is mostly just a question of style.

Using blocks to indicate the beginning and end of an element in the XML to generate solves the API problem perfectly. Here's how to implement this with a new GenX::Writer#element method defined at the C level.

static VALUE
writer_element (int argc, VALUE *argv, VALUE self)
{
  writer_begin_element (argc, argv, self);

  if (rb_block_given_p ())
    {
      rb_yield (Qnil);

      writer_end_element (self);

      return Qnil;
    }
  else
    rb_raise (rb_eRuntimeError, "must be called with a block");
}

All of this is merely a new method that calls the begin_element method and then invokes the block it received (or throws an exception if it didn't receive one), then calls end_element. In order to nest elements or put text inside them, the passed-in block needs to contain the code to create that content. There are two new C-level functions here, rb_block_given_p, the predicate that asks "Was I given a block?" and rb_yield, which invokes the block. Because there's nothing else to pass to the block, the code passes Qnil.

As usual, the code to hook up this new method in Init_genx4r looks like:

rb_define_method (rb_cGenXWriter,
                  "element",
                  writer_element,
                  -1);

With that in place, using the new method like this:

w = GenX::Writer.new

s = ''

w.begin_document(s)

w.element('foo') do
  w.element('bar') do
    w.text('baz')
  end
end

w.end_document

puts s

produces output of this:

<foo><bar>baz</bar></foo>

Isn't that a much nicer API? Instead of having to remember to handle the nesting of elements manually, users can encode it directly into their program, making it much more difficult to do incorrectly. Note that the same technique can easily apply to the GenX::Writer#begin_document and GenX::Writer#end_document methods.

Some Conclusions

This whole idea started off as something of an experiment. Is it really as easy as I thought it would be to wrap a C library in Ruby? I don't know about you, but I think it was a success. In fewer than 300 lines of C, I've provided users with access to a useful subset of the GenX library's functionality. If I were going to implement this code directly in Ruby, it would be longer and most likely buggier, simply because the C version has been debugged already and our hypothetical Ruby version has not.

That said, GenX4r is still somewhat incomplete. The begin_document and end_document methods still need block-based cover method wrappers, and for efficiency I also want to provide users the ability to predeclare namespaces and elements, to avoid having to validate them each time they're used. Plus, I'm a reasonably new Ruby hacker, so it's not out of the realm of possibility that there are bugs in the wrapper. Even so, I think this is a reasonable proof of concept. The additional work I've done on GenX4r that isn't documented here indicates to me that it is a success. On the strength of this experience, I have no trouble recommending Ruby as a convenient scripting language to wrap around libraries written in C.

For the record, the current version of GenX4r, which includes all this hypothetical functionality in one form or another, constitutes only 584 lines of C. If you're interested in using it or helping me develop it further, please grab the latest version from the GenX4r home page.

Garrett Rooney is a software developer at FactSet Research Systems, where he works on real-time market data.


Return to ONLamp.com.



Sponsored by: