ONLamp.com    
 Published on ONLamp.com (http://www.onlamp.com/)
 See this if you're having trouble printing code examples


Extending Ruby with C

by Garrett Rooney
11/18/2004

Ruby, if you've never heard of it, is an object-oriented scripting language, similar in many ways to Perl and Python. It originates from Japan and is young, as far as programming languages go. There are many really good reasons you might want to use the Ruby language; I'm not going to go into all of them here, but the one at the core of this article is the ease with which you can write Ruby extensions in C.

I'm a big fan of the so-called agile programming languages. I think they have a huge advantage over more traditional languages like C and C++. They also have some drawbacks, among the largest being that there's an awful lot of existing code written in C and C++. It's hard to sell people on moving to something new if they have to leave all their old toys behind.

The standard response to these sort of arguments is that you can easily write an extension that bridges the gap between your old C code and your new Perl or Python, or whatever agile language is hot this week, code. Unfortunately, I've generally found that the APIs for bridging the gap between Perl and C are either cryptic (XS) or fragile (Inline::C). While Python is better in some ways, I still find its C API rather difficult to read. Tools such as SWIG can help alleviate this problem, but you still need to write a bunch of glue code to bridge the gap between the high-level agile languages and the low-level C code.

When I first looked at doing the same kind of thing for Ruby, a whole new world opened up. The APIs are simple, to the point where I was up and running in minutes rather than hours. All you need to know to start is in the README.EXT file in the top level of the Ruby source tree. If you need help with something that isn't documented there, you can't ask for a clearer example than the Ruby source code itself. In short, I was just aching for a test case, some C code I could wrap up in a Ruby extension to prove how simple it is to make something that's easy to use. For my test case, I chose the GenX library.

GenX

Related Reading

Programming Ruby
The Pragmatic Programmer's Guide, Second Edition
By Dave Thomas

GenX is a simple C library for generating correct, canonical XML. It verifies that its data is valid UTF-8 and the structure of the XML document being generated is valid, and it forces you to use canonical XML--that may not mean much to you right now, but it can be significant if you need to compare two XML documents to determine their equivalence. Tim Bray wrote GenX and hosts it at GenxStatus. GenX originally attracted me because it provides a way to avoid problems related to invalid XML, which I've encountered in my own work. In addition to its usefulness, it's also perfect for an example of how to embed a C library in Ruby because it's very small, self contained, and has a well-defined API we can wrap up in Ruby without too much trouble.

Justification

At this point it's worth asking whether a Ruby extension is really the best way to make this kind of functionality available.

Using an extension means that users need to install a binary distribution precompiled for their versions of Ruby and their operating systems, or to build it themselves, which means they need access to a C compiler. Additionally, extending Ruby via C has its own set of dangers. If you screw something up in a standard Ruby module, pretty much the worst you can do is cause an exception to be thrown. It's possible for users to recover from this if they're paranoid enough about catching exceptions and structure their code correctly. In a C-based extension, an error can corrupt memory or cause a segmentation fault or any number of other problems from which recovery is difficult, and all of which have the chance to crash the underlying Ruby interpreter.

That said, in this particular case I think providing direct access to the underlying GenX library via a C extension is the way to go. The GenX library is available right now, it works, and it does its job in a very efficient manner. There's no reason to duplicate functionality unnecessarily. Even if I did rewrite this in pure Ruby, all I am likely to accomplish is slowing things down. Plus, GenX is exceptionally self contained; while using the library does require that users either use a precompiled extension or possess a C compiler, it at least doesn't bring in any other third-party requirements. Finally, the GenX API is quite straightforward. It's reasonable to assume that we'll be able to implement this extension without undue risk of crashing our Ruby interpreter due to bugs in our code.

Some Basic Functionality

The first step in writing a Ruby extension is to create something that compiles and runs. That means writing an extconf.rb file that tells Ruby how to compile and link your extension, and then writing the bare-bones C file that makes up the extension. With these two steps completed, you'll have a Ruby module that you can require and a new class you can instantiate, albeit not a very useful one because it won't actually have any methods.

The extconf.rb file is a short Ruby program that makes use of the mkmf module to build a simple makefile, which you use to build your extension. There's a fair amount of specialized functionality you can put in your extconf.rb file, but for our purposes the bare minimum will do. Here's the entirety of my extconf.rb file:

require 'mkmf'

dir_config("genx4r")

create_makefile("genx4r")

This tells Ruby to use all of the .c files in the current working directory to build a extension named genx4r, and that it should write out a makefile to compile and link it. If you copy all the .c and .h files from the GenX tarball into the current directory, you can run ruby extconf.rb && make and then have a Ruby extension sitting there just waiting for you to require it in our script. Here's the process:

$ ruby extconf.rb 
creating Makefile
$ make
gcc -fno-common   -g -Os -pipe -no-cpp-precomp -fno-common -DHAVE_INTTYPES_H
-pipe -pipe  -I. -I/usr/lib/ruby/1.6/powerpc-darwin7.0 -I.    -c -o charProps.o
charProps.c
gcc -fno-common   -g -Os -pipe -no-cpp-precomp -fno-common -DHAVE_INTTYPES_H
-pipe -pipe  -I. -I/usr/lib/ruby/1.6/powerpc-darwin7.0 -I.    -c -o genx.o
genx.c
cc -fno-common   -g -Os -pipe -no-cpp-precomp -fno-common -DHAVE_INTTYPES_H
-pipe -pipe  -dynamic -bundle -undefined suppress -flat_namespace
-L/usr/lib/ruby/1.6/powerpc-darwin7.0 -L/usr/lib  -o genx4r.bundle charProps.o
genx.o   -ldl -lobjc 
$ ls
Makefile        charProps.o     genx.c          genx.o
charProps.c     extconf.rb      genx.h          genx4r.bundle*
$ irb
irb(main):001:0> require 'genx4r'
LoadError: Failed to lookup Init function ./genx4r.bundle
        from (irb):1:in `require'
        from (irb):1
irb(main):002:0>

OK, so that sort of works.... There's a Ruby extension, but trying to require it from inside Ruby only produces an error. That's because none of the .c files defined an Init function. When Ruby tries to load an extension, the first thing it does is look for a function named Init_extname, where extname is the name of the extension. Because that function doesn't exist, Ruby obviously can't find it and throws a LoadError exception.

The next step is to implement Init_genx4r to allow the extension to load successfully. The bare minimum necessary is simply an empty function named Init_genx4r that takes no arguments and returns nothing. I like that. Here are the current contents of the genx4r.c file:

#include "ruby.h"

void
Init_genx4r()
{
  /* nothing here yet */
}

Rerun extconf.rb and make. When you try to load the genx4r module with require, you should have better results:

$ irb
irb(main):001:0> require 'genx4r'
=> true
irb(main):002:0>

Creating a Class

The extension loads, but it still doesn't actually do anything. It needs definitions for the classes that make up the interface to the GenX library. For now, I'll define one top-level Ruby module named GenX and a single class, Writer, that lives in it. That class is simply a thin wrapper around the C-level genxWriter type. Here's the next iteration of genx4r.c:

#include "ruby.h"

#include "genx.h"

static VALUE rb_mGenX;
static VALUE rb_cGenXWriter;

static void
writer_mark (genxWriter w)
{}

static void
writer_free (genxWriter w)
{
  genxDispose (w);
}

static VALUE
writer_allocate (VALUE klass)
{
  genxWriter writer = genxNew (NULL, NULL, NULL);

  return Data_Wrap_Struct (klass, writer_mark, writer_free, writer);
}

void
Init_genx4r ()
{
  rb_mGenX = rb_define_module ("GenX");

  rb_cGenXWriter = rb_define_class_under (rb_mGenX, "Writer", rb_cObject);

  /* NOTE: this only works in ruby 1.8.x.  for ruby 1.6.x you instead define
   *       a 'new' method, which does much the same thing as this. */
  rb_define_alloc_func (rb_cGenXWriter, writer_allocate);
}

That's a lot of new code. First comes the #include of genx.h, because it needs to use functions defined by GenX. The two VALUE variables represent the module and the class. Each object in Ruby (and remember, everything in Ruby is an object) has a VALUE; think of it as a reference to the object. The beginning of Init_genx4r initializes these variables by calling rb_define_module to create the GenX module and rb_define_class_under to define the Writer class.

Next, Ruby needs to know how to allocate the guts of the GenX::Writer object. That's where allocate comes in, using rb_define_alloc_func to associate the writer_allocate function with the allocate method. writer_allocate creates a genxWriter object with genxNew and turns it into a Ruby object via the Data_Wrap_Struct macro. Data_Wrap_Struct simply takes a VALUE representing the class of the new object (passed as an argument to writer_allocate), two function pointers used for Ruby's mark-and-sweep garbage collection, and a pointer to the underlying C-level data structure--in this case the genxWriter itself--and returns a new VALUE that refers to the object, which is simply a thin wrapper around the C-level pointer. Finally, the code has the mark function for the object, writer_mark, which actually does nothing, and the destructor, writer_free, which calls genxDispose to clean up the genxWriter allocated in writer_allocate. When wrapping a more complicated structure that includes references to other Ruby-level objects, the mark function must call rb_gc_mark on each of them to tell Ruby when nothing references them any longer and they are ready for garbage collection.

One thing to note about genx4r.c is that the only function accessible outside that file is Init_genx4r. Everything else is static, which means that it won't leak out into the global namespace and cause linkage errors if some other part of the program happens to use the same function or variable name.

It's time to take a quick jaunt through irb to confirm that it's possible to create an instance of the new class:

$ irb
irb(main):001:0> require 'genx4r'
=> true
irb(main):002:0> w = GenX::Writer.new
=> #<GenX::Writer:0x321f84>
irb(main):003:0>

Sure enough, it's an instance of our new GenX::Writer class. Adding a few methods will make it actually useful!

Adding Methods

Now it's time to add some actual functionality to the new class, which means deciding exactly how it should work. The basic usage pattern for GenX itself is simple: create a document, then create a variety of elements, possibly mixed with character data, and finally close the document. In the simplest terms, this boils down to five methods: GenX::Writer#begin_document, GenX::Writer#end_document, GenX::Writer#begin_element, GenX::Writer#end_element, and GenX::Writer#text. Because you can't do anything without a document, I'll start with begin_document and end_document.

GenX::Writer#begin_document

The underlying GenX library has two functions for starting a new document. The first, genxStartDocFile, takes a FILE * argument to which to write the contents of the document. I want something a bit more generic than that, because there's no certainty that the user wants to send the output to a file. It might be nice to be able to write the XML to a network socket, or a string, or any number of other things. Fortunately GenX also provides the genxStartDocSender function, which requires a genxSender structure, containing pointers to send, sendBounded, and flush functions. The extension needs to implement those functions. Here is writer_send, the send function:

static genxStatus
writer_send (void *baton, constUtf8 s)
{
  VALUE file = (VALUE) baton;
  VALUE ary;

  if (! rb_respond_to (file, rb_intern ("<<")))
    rb_raise (rb_eRuntimeError, "target must respond to '<<'");

  ary = rb_ary_new2 (2);

  rb_ary_store (ary, 0, file);
  rb_ary_store (ary, 1, rb_str_new2 (s));

  if (rb_rescue (call_write, ary, handle_exception, Qnil) == Qfalse)
    return GENX_IO_ERROR;
  else
    return GENX_SUCCESS;
}

writer_send takes two arguments. First is a void * that points to the "user data" associated with the genxWriter. In that case, the pointer holds a VALUE that refers to the object into which to send data. This could be a File object, a String, or anything else—what's important is that it responds to the << method, which the code uses to write the data from GenX. The second argument is a constUtf8, which is GenX's typedef for a pointer to the start of a null-terminated Utf8 string containing the data to write out.

The code first checks whether the object it received actually responds to the << method. If it doesn't, then it can't do much else, so it raises an exception via rb_raise. If it passes that check, it creates a Ruby array to hold the object into which to write the data and a Ruby string to hold the data to write. The array exists to make it easier to pass them to call_write, a helper function that does the writing by way of the rb_rescue function. Here's the implementation of call_write:

static VALUE
call_write (VALUE ary)
{
  rb_funcall (rb_ary_entry (ary, 0),
              rb_intern ("<<"),
              1,
              rb_ary_entry (ary, 1));
}

All this does is use rb_funcall to call the << method on the file object, extracted from the array using the rb_ary_entry. The << method takes a single argument (thus the 1): the string built up in writer_send, also extracted from the array.

If all this code needs to do is call a function, why not do it directly in writer_send, thus avoiding the rb_rescue gymnastics? In case the << method throws an exception, rb_rescue calls the handle_exception function passed to it. It passes the return value of handle_exception back to the caller through rb_rescue. In this case, handle_exception returns Qfalse so that writer_send knows to return GENX_IO_ERROR to its caller. Here's the (trivial) implementation of handle_exception:

static VALUE
handle_exception (VALUE unused)
{
  return Qfalse;
}

All right, that's the implementation of send. The next function is sendBounded. This is pretty much the same thing; it just needs to create the string it passes in to call_write based on a start pointer and an end pointer instead of a null-terminated string. The implementation looks like this:

static genxStatus
writer_send_bounded (void *baton, constUtf8 start, constUtf8 end)
{
  VALUE file = (VALUE) baton;
  VALUE ary;

  if (! rb_respond_to (file, rb_intern ("<<")))
    rb_raise (rb_eRuntimeError, "target must respond to '<<'");

  ary = rb_ary_new2 (2);

  rb_ary_store (ary, 0, file);
  rb_ary_store (ary, 1, rb_str_new (start, end - start));

  if (rb_rescue (call_write, ary, handle_exception, Qnil) == Qfalse)
    return GENX_IO_ERROR;
  else
    return GENX_SUCCESS;
}

As you can see, the only difference here is in calling rb_str_new instead of rb_str_new2. This takes a pointer and a length, calculated based on the given start and end pointers.

Finally, this code needs a flush function. Here's writer_flush and its helper function call_flush:

static VALUE
call_flush (VALUE file)
{
  rb_funcall (file, rb_intern ("flush"), 0);

  return Qtrue;
}

static genxStatus
writer_flush (void *baton)
{
  VALUE file = (VALUE) baton;

  /* if we can't flush, just let it go... */
  if (! rb_respond_to (file, rb_intern ("flush")))
    return GENX_SUCCESS;

  if (rb_rescue (call_flush, file, handle_exception, Qnil) == Qfalse)
    return GENX_IO_ERROR;
  else
    return GENX_SUCCESS;
}

This is rather similar to the send and sendBounded functions, so let's consider only the differences. First of all, if the object to which to write the data doesn't respond to the flush method, the code returns success, assuming that it's holding the data in memory or something else where flush is not applicable. The only other difference is that instead of the << method, it calls the flush method on the object.

With all three helper functions, it's time to write the writer_begin_document function itself. That'll finally make it possible to start a new document with the GenX::Writer object.

static genxSender writer_sender = { writer_send,
                                    writer_send_bounded,
                                    writer_flush };

static VALUE
writer_begin_document (VALUE self, VALUE file)
{
  genxWriter w;

  Data_Get_Struct (self, struct genxWriter_rec, w);

  if (! rb_respond_to (file, rb_intern ("<<")))
    rb_raise (rb_eRuntimeError, "target must respond to '<<'");

  genxSetUserData(w, (void *) file);

  GENX4R_ERR (genxStartDocSender (w, &writer_sender), w);

  return Qnil;
}

The first thing to do is to unwrap the contents of the self object. Recall the earlier use of Data_Wrap_Struct to turn a struct genxWriter_rec into our object. This code uses its counterpart, Data_Get_Struct, to pull it back out again. For paranoia's sake, it confirms that the file it received responds to the << method, because it will call that method later on. Next, calling genxSetUserData makes file useful as the "user data," so that when the time comes to call the sender functions, they can call the appropriate methods on it. Finally, the code calls genxStartDocSender, passing it the writer and a pointer to the genxSender so it knows which functions it should call later on. The object is now ready to do some real work.

There is something a little strange about the call to genxStartDocSender function though, being wrapped up in the GENX_ERR macro. This is a helper to let you avoid writing all sorts of boilerplate error-handling code in a lot of different places. Here's the definition:

#define GENX4R_ERR(expr, w)                                         \
  do {                                                              \
    genxStatus genx4r__status = (expr);                             \
    if (genx4r__status)                                             \
      rb_raise (rb_cGenXException, "%s", genxLastErrorMessage (w)); \
  } while (0)

All it does is declare a genxStatus to hold the return value of the expression it's calling. If that status is nonzero (meaning anything other than GENX_SUCCESS), it raises an exception via rb_raise. The exception is of type rb_cGenXException and holds the result of genxLastErrorMessage, making it possible to present a reasonable error to the caller. A do { } while loop (with no trailing semicolon) wraps the entire expression, making it possible to treat it just like a regular C statement.

This is the first appearance of rb_cGenXException. Where did it come from? Just like the rb_cGenXWriter VALUE used to hold the reference to the GenX::Writer class, rb_cGenXException is a VALUE holding a reference to the GenX::Exception class. Its definition is in Init_genx4r:

rb_cGenXException = rb_define_class_under (rb_mGenX,
                                           "Exception",
                                           rb_eStandardError);

This creates a new class within the rb_mGenX module named Exception; the class inherits from the rb_eStandardError class, better known in Rubyland as StandardError.

The final step here is to add a call in rb_define_method to the init function in order to hook up writer_begin_document to the GenX::Writer class. It looks like this:

rb_define_method (rb_cGenXWriter,
                  "begin_document",
                  writer_begin_document,
                  1);

Calling the begin_document method on an instance of the GenX::Writer class causes Ruby to call writer_begin_document. The 1 indicates that begin_document takes a single VALUE as its argument.

GenX::Writer#end_document

After starting a document with Genx::Writer#begin_document, there must be some way to end it, by way of the corresponding Genx::Writer#end_document method. This is, fortunately, much simpler than begin_document. All it needs to do is call genxEndDocument. The actual implementation looks like this:

static VALUE
writer_end_document (VALUE self)
{
  genxWriter w;

  Data_Get_Struct (self, struct genxWriter_rec, w);

  GENX4R_ERR (genxEndDocument (w), w);

  return Qnil;
}

As you can see, this is basically boilerplate stuff. It pulls the writer out of our object using Data_Get_Struct, calls genxEndDocument on it (wrapped up in a GENX_ERR macro to handle the error checking), and returns Qnil, which in Ruby terms returns nil to the caller.

Then it hooks up the method via the appropriate rb_define_method call in Init_genx4r:

rb_define_method (rb_cGenXWriter,
                  "end_document",
                  writer_end_document,
                  0);

This is exactly like hooking up the begin_document method, except that it takes no arguments instead of one.

GenX::Writer#begin_element

It has taken an awful lot of trouble to put the begin_document and end_document methods in place. Now users can start and end documents--but in order for this to be of much use, they'll want to put some content in the document. Because this is XML, and XML documents all start with a root element, that means the next methods to implement are GenX::Writer#begin_element and GenX::Writer#end_element. The obvious place to start is GenX::Writer#begin_element.

Genx::Writer#begin_element is a thin wrapper around the genxStartElementLiteral function. It's really similar to the methods already shown. Here's the implementation:

static VALUE
writer_begin_element (int argc, VALUE *argv, VALUE self)
{
  genxWriter w;
  VALUE xmlns, name;

  switch (argc)
    {
      case 1:
        xmlns = 0;
        name = argv[0];
        break;

      case 2:
        xmlns = argv[0];
        Check_Type (xmlns, T_STRING);
        name = argv[1];
        break;

      default:
        rb_raise (rb_eRuntimeError, "invalid arguments");
    }

  Check_Type (name, T_STRING);

  Data_Get_Struct (self, struct genxWriter_rec, w);

  GENX4R_ERR (genxStartElementLiteral
               (w,
                xmlns ? (constUtf8) RSTRING (xmlns)->ptr : NULL,
                (constUtf8) RSTRING (name)->ptr), w);

  return Qnil;
}

A few things here haven't appeared before. First of all, this method takes a variable number of arguments. Most of the code in this function goes to figuring out how many arguments it received and setting things up as appropriate. The way Ruby lets you do this at the C level is that the underlying C function takes as arguments an integer that holds the number of arguments passed, a pointer to an array of VALUEs that contains each of the arguments, and a VALUE that holds the invoking object.

If it receives one argument, it uses that as the name of the element.

If it receives two arguments, then the first is the namespace and the second is the name. Given an xmlns argument, the code verifies that it is a String using the Check_Type macro with the T_STRING constant. The same check occurs for the element's name. Then, as usual, it pulls the genxWriter out of self and finally calls the underlying genxStartElementLiteral function, passing in the namespace if provided and valid, and a NULL otherwise. When passing the namespace and element name, note that the code uses the RSTRING macro to cast the VALUE to the underlying string data structure before accessing the C-string pointer via the ptr field in that structure.

Once again, Init_genx4r needs more code to hook up this method:

rb_define_method (rb_cGenXWriter,
                  "begin_element",
                  writer_begin_element,
                  -1);

Notice the -1 that tells Ruby to call this method via the count/array/object style of argument passing.

Now that there's a way to start an element, there must be a way to end it. That's the purpose of the GenX::Writer#end_element method.

GenX::Writer#end_element

As you might have guessed, GenX::Writer#end_element is very similar to GenX::Writer#end_document. Here's the implementation:

static VALUE
writer_end_element (VALUE self)
{
  genxWriter w;

  Data_Get_Struct (self, struct genxWriter_rec, w);

  GENX4R_ERR (genxEndElement (w), w);

  return Qnil;
}

All it does is pull out the writer and call genxEndElement on it. GenX does the rest. As usual, it takes one call in Init_genx4r to hook up the method:

rb_define_method (rb_cGenXWriter,
                  "end_element",
                  writer_end_element,
                  0);

Now GenX4r can actually produce XML. Jump into irb and try it out.

$ irb
irb(main):001:0> require 'genx4r'
=> true
irb(main):002:0> w = GenX::Writer.new
=> #<GenX::Writer:0x321f0c>
irb(main):003:0> s = ''
=> ""
irb(main):004:0> w.begin_document(s)
=> nil
irb(main):005:0> w.begin_element("foo")
=> nil
irb(main):006:0> w.end_element       
=> nil
irb(main):007:0> w.end_document
=> nil
irb(main):008:0> s
=> "<foo></foo>"
irb(main):009:0>

There you have it! The extension actually produced some XML output! Of course, most XML needs some textual content within at least some of the tags. Making that work means implementing GenX::Writer#text, a wrapper around genxAddText.

GenX::Writer#text

After everything implemented so far, GenX::Writer#text doesn't have anything all that new to it. Take a look:

static VALUE
writer_text (VALUE self, VALUE text)
{
  genxWriter w;

  Check_Type (text, T_STRING);

  Data_Get_Struct (self, struct genxWriter_rec, w);

  GENX4R_ERR (genxAddText (w, (constUtf8) RSTRING (text)->ptr), w);

  return Qnil;
}

There are the usual hoops to access the genxWriter and then a call to pass the text through to genxAddText. Here's code to hook up the method in Init_genx4r.

rb_define_method (rb_cGenXWriter,
                  "text",
                  writer_text,
                  1);

There you have it, a functionally complete wrapper. Try it out in irb to prove it.

$ irb
irb(main):001:0> require 'genx4r'
=> true
irb(main):002:0> w = GenX::Writer.new
=> #<GenX::Writer:0x321f0c>
irb(main):003:0> s = ''
=> ""
irb(main):004:0> w.begin_document(s)
=> nil
irb(main):005:0> w.begin_element("foo")
=> nil
irb(main):006:0> w.text("bar")
=> nil
irb(main):007:0> w.end_element       
=> nil
irb(main):008:0> w.end_document
=> nil
irb(main):009:0> s
=> "<foo>bar</foo>"
irb(main):010:0>

With the combination of elements and text, you can now start using GenX4r for some nontrivial tasks. Before that, I'd like to write some tests to verify that everything works now and that it will continue to work as I make changes in the future.

Unit Testing

In Ruby, the accepted way to write unit tests is to use the Test::Unit framework. This is a standard unit test framework, written along the lines of the popular JUnit package. To use it, subclass the Test::Unit::TestCase class and implement your tests as methods that are named test_something (where the something part changes for each test). Inside the tests, use the assert method to indicate what conditions need to be true for the tests to pass. Here's a simple test case to start:

require 'test/unit'
require 'genx4r'

class BasicsTest < Test::Unit::TestCase
  def test_element
    w = GenX::Writer.new
    s = ''

    w.begin_document(s)
    w.begin_element('foo')
    w.text('bar')
    w.end_element
    w.end_document

    assert s == '<foo>bar</foo>'
  end
end

Run the tests by running that file. You should receive output similar to the following:

$ ruby test.rb
Loaded suite test
Started
.
Finished in 0.005774 seconds.

1 tests, 1 assertions, 0 failures, 0 errors

The line with the single dot on it is where you see the output for the tests. Each passing test prints a . whereas failing tests print an F. To add more tests, fill in more test methods. They will run automatically when you run the file.

Making Things a Bit More Ruby-esque

All right, now there's a working module and a test suite to make sure it keeps on working. I'm all set to release this new toy to the unsuspecting masses out there on the Internet, right? Not quite. Although the API works, it's not ideal. You have to remember to call the GenX::Writer#end_element and GenX::Writer#end_document methods at exactly the right times; otherwise you'll either mess up the output (if elements nest incorrectly) or even possibly throw an exception because you call underlying GenX functions out of order. Remember that GenX is big on enforcing correctness, so if you screw up, it will tell you about it.

It would be really nice to arrange for the module to call these end methods at the appropriate times. Fortunately, Ruby has a way to do that: blocks.

A block in Ruby is a chunk of code passed to a method as one of its arguments. The method can then call the yield method to invoke the block whenever it wants. The syntax looks like this:

def takes_a_block(&block)
  puts "before yield"
  yield
  puts "after yield"
end

takes_a_block do
  puts "in the block"
end

Running this code produces the following output:

$ ruby blocks-example.rb
before yield
in the block
after yield

Note that Ruby allows braces as block delimiters instead of do and end, in which case the method call could have looked like takes_a_block { puts "in the block" }. Both ways are valid. Which one you use is mostly just a question of style.

Using blocks to indicate the beginning and end of an element in the XML to generate solves the API problem perfectly. Here's how to implement this with a new GenX::Writer#element method defined at the C level.

static VALUE
writer_element (int argc, VALUE *argv, VALUE self)
{
  writer_begin_element (argc, argv, self);

  if (rb_block_given_p ())
    {
      rb_yield (Qnil);

      writer_end_element (self);

      return Qnil;
    }
  else
    rb_raise (rb_eRuntimeError, "must be called with a block");
}

All of this is merely a new method that calls the begin_element method and then invokes the block it received (or throws an exception if it didn't receive one), then calls end_element. In order to nest elements or put text inside them, the passed-in block needs to contain the code to create that content. There are two new C-level functions here, rb_block_given_p, the predicate that asks "Was I given a block?" and rb_yield, which invokes the block. Because there's nothing else to pass to the block, the code passes Qnil.

As usual, the code to hook up this new method in Init_genx4r looks like:

rb_define_method (rb_cGenXWriter,
                  "element",
                  writer_element,
                  -1);

With that in place, using the new method like this:

w = GenX::Writer.new

s = ''

w.begin_document(s)

w.element('foo') do
  w.element('bar') do
    w.text('baz')
  end
end

w.end_document

puts s

produces output of this:

<foo><bar>baz</bar></foo>

Isn't that a much nicer API? Instead of having to remember to handle the nesting of elements manually, users can encode it directly into their program, making it much more difficult to do incorrectly. Note that the same technique can easily apply to the GenX::Writer#begin_document and GenX::Writer#end_document methods.

Some Conclusions

This whole idea started off as something of an experiment. Is it really as easy as I thought it would be to wrap a C library in Ruby? I don't know about you, but I think it was a success. In fewer than 300 lines of C, I've provided users with access to a useful subset of the GenX library's functionality. If I were going to implement this code directly in Ruby, it would be longer and most likely buggier, simply because the C version has been debugged already and our hypothetical Ruby version has not.

That said, GenX4r is still somewhat incomplete. The begin_document and end_document methods still need block-based cover method wrappers, and for efficiency I also want to provide users the ability to predeclare namespaces and elements, to avoid having to validate them each time they're used. Plus, I'm a reasonably new Ruby hacker, so it's not out of the realm of possibility that there are bugs in the wrapper. Even so, I think this is a reasonable proof of concept. The additional work I've done on GenX4r that isn't documented here indicates to me that it is a success. On the strength of this experience, I have no trouble recommending Ruby as a convenient scripting language to wrap around libraries written in C.

For the record, the current version of GenX4r, which includes all this hypothetical functionality in one form or another, constitutes only 584 lines of C. If you're interested in using it or helping me develop it further, please grab the latest version from the GenX4r home page.

Garrett Rooney is a software developer at FactSet Research Systems, where he works on real-time market data.


Return to ONLamp.com.

Copyright © 2009 O'Reilly Media, Inc.