Ruby, if you've never heard of it, is an object-oriented scripting language, similar in many ways to Perl and Python. It originates from Japan and is young, as far as programming languages go. There are many really good reasons you might want to use the Ruby language; I'm not going to go into all of them here, but the one at the core of this article is the ease with which you can write Ruby extensions in C.
I'm a big fan of the so-called agile programming languages. I think they have a huge advantage over more traditional languages like C and C++. They also have some drawbacks, among the largest being that there's an awful lot of existing code written in C and C++. It's hard to sell people on moving to something new if they have to leave all their old toys behind.
The standard response to these sort of arguments is that you can easily write an extension that bridges the gap between your old C code and your new Perl or Python, or whatever agile language is hot this week, code. Unfortunately, I've generally found that the APIs for bridging the gap between Perl and C are either cryptic (XS) or fragile (Inline::C). While Python is better in some ways, I still find its C API rather difficult to read. Tools such as SWIG can help alleviate this problem, but you still need to write a bunch of glue code to bridge the gap between the high-level agile languages and the low-level C code.
When I first looked at doing the same kind of thing for Ruby, a whole new
world opened up. The APIs are simple, to the point where I was up and running
in minutes rather than hours. All you need to know to start is in the
README.EXT file in the top level of the Ruby source tree. If you
need help with something that isn't documented there, you can't ask for a
clearer example than the Ruby source code itself. In short, I was just aching
for a test case, some C code I could wrap up in a Ruby extension to prove how
simple it is to make something that's easy to use. For my test case, I chose the
GenX library.
|
Related Reading
Programming Ruby |
GenX is a simple C library for generating correct, canonical XML. It verifies that its data is valid UTF-8 and the structure of the XML document being generated is valid, and it forces you to use canonical XML--that may not mean much to you right now, but it can be significant if you need to compare two XML documents to determine their equivalence. Tim Bray wrote GenX and hosts it at GenxStatus. GenX originally attracted me because it provides a way to avoid problems related to invalid XML, which I've encountered in my own work. In addition to its usefulness, it's also perfect for an example of how to embed a C library in Ruby because it's very small, self contained, and has a well-defined API we can wrap up in Ruby without too much trouble.
At this point it's worth asking whether a Ruby extension is really the best way to make this kind of functionality available.
Using an extension means that users need to install a binary distribution precompiled for their versions of Ruby and their operating systems, or to build it themselves, which means they need access to a C compiler. Additionally, extending Ruby via C has its own set of dangers. If you screw something up in a standard Ruby module, pretty much the worst you can do is cause an exception to be thrown. It's possible for users to recover from this if they're paranoid enough about catching exceptions and structure their code correctly. In a C-based extension, an error can corrupt memory or cause a segmentation fault or any number of other problems from which recovery is difficult, and all of which have the chance to crash the underlying Ruby interpreter.
That said, in this particular case I think providing direct access to the underlying GenX library via a C extension is the way to go. The GenX library is available right now, it works, and it does its job in a very efficient manner. There's no reason to duplicate functionality unnecessarily. Even if I did rewrite this in pure Ruby, all I am likely to accomplish is slowing things down. Plus, GenX is exceptionally self contained; while using the library does require that users either use a precompiled extension or possess a C compiler, it at least doesn't bring in any other third-party requirements. Finally, the GenX API is quite straightforward. It's reasonable to assume that we'll be able to implement this extension without undue risk of crashing our Ruby interpreter due to bugs in our code.
The first step in writing a Ruby extension is to create something that
compiles and runs. That means writing an extconf.rb file that tells
Ruby how to compile and link your extension, and then writing the bare-bones C
file that makes up the extension. With these two steps completed, you'll have a
Ruby module that you can require and a new class you can
instantiate, albeit not a very useful one because it won't actually have any
methods.
The extconf.rb file is a short Ruby program that
makes use of the mkmf module to build a simple makefile,
which you use to build your extension. There's a fair amount of
specialized functionality you can put in your extconf.rb
file, but for our purposes the bare minimum will do. Here's the
entirety of my extconf.rb file:
require 'mkmf'
dir_config("genx4r")
create_makefile("genx4r")
This tells Ruby to use all of the .c files in the current working
directory to build a extension named genx4r, and that it should write
out a makefile to compile and link it. If you copy all the .c and
.h files from the GenX tarball into the current directory, you can run
ruby extconf.rb && make and then have a Ruby extension
sitting there just waiting for you to require it in our script.
Here's the process:
$ ruby extconf.rb
creating Makefile
$ make
gcc -fno-common -g -Os -pipe -no-cpp-precomp -fno-common -DHAVE_INTTYPES_H
-pipe -pipe -I. -I/usr/lib/ruby/1.6/powerpc-darwin7.0 -I. -c -o charProps.o
charProps.c
gcc -fno-common -g -Os -pipe -no-cpp-precomp -fno-common -DHAVE_INTTYPES_H
-pipe -pipe -I. -I/usr/lib/ruby/1.6/powerpc-darwin7.0 -I. -c -o genx.o
genx.c
cc -fno-common -g -Os -pipe -no-cpp-precomp -fno-common -DHAVE_INTTYPES_H
-pipe -pipe -dynamic -bundle -undefined suppress -flat_namespace
-L/usr/lib/ruby/1.6/powerpc-darwin7.0 -L/usr/lib -o genx4r.bundle charProps.o
genx.o -ldl -lobjc
$ ls
Makefile charProps.o genx.c genx.o
charProps.c extconf.rb genx.h genx4r.bundle*
$ irb
irb(main):001:0> require 'genx4r'
LoadError: Failed to lookup Init function ./genx4r.bundle
from (irb):1:in `require'
from (irb):1
irb(main):002:0>
OK, so that sort of works.... There's a Ruby extension, but trying to
require it from inside Ruby only produces an error. That's because
none of the .c files defined an Init function. When Ruby
tries to load an extension, the first thing it does is look for a function named
Init_extname, where extname is the name of the
extension. Because that function doesn't exist, Ruby obviously can't find it
and throws a LoadError exception.
The next step is to implement Init_genx4r to allow the
extension to load successfully. The bare minimum necessary is simply an empty
function named Init_genx4r that takes no arguments and returns
nothing. I like that. Here are the current contents of the genx4r.c
file:
#include "ruby.h"
void
Init_genx4r()
{
/* nothing here yet */
}
Rerun extconf.rb and make. When you try to load
the genx4r module with require, you should have
better results:
$ irb
irb(main):001:0> require 'genx4r'
=> true
irb(main):002:0>
The extension loads, but it still doesn't actually do anything. It needs
definitions for the classes that make up the interface to the GenX library.
For now, I'll define one top-level Ruby module named GenX and a
single class, Writer, that lives in it. That class is simply a
thin wrapper around the C-level genxWriter type. Here's the next
iteration of genx4r.c:
#include "ruby.h"
#include "genx.h"
static VALUE rb_mGenX;
static VALUE rb_cGenXWriter;
static void
writer_mark (genxWriter w)
{}
static void
writer_free (genxWriter w)
{
genxDispose (w);
}
static VALUE
writer_allocate (VALUE klass)
{
genxWriter writer = genxNew (NULL, NULL, NULL);
return Data_Wrap_Struct (klass, writer_mark, writer_free, writer);
}
void
Init_genx4r ()
{
rb_mGenX = rb_define_module ("GenX");
rb_cGenXWriter = rb_define_class_under (rb_mGenX, "Writer", rb_cObject);
/* NOTE: this only works in ruby 1.8.x. for ruby 1.6.x you instead define
* a 'new' method, which does much the same thing as this. */
rb_define_alloc_func (rb_cGenXWriter, writer_allocate);
}
That's a lot of new code. First comes the #include of
genx.h, because it needs to use functions defined by GenX. The two
VALUE variables represent the module and the class. Each object in
Ruby (and remember, everything in Ruby is an object) has a
VALUE; think of it as a reference to the object. The beginning of
Init_genx4r initializes these variables by calling
rb_define_module to create the GenX module and
rb_define_class_under to define the Writer class.
Next, Ruby needs to know how to allocate the guts of the
GenX::Writer object. That's where allocate comes in,
using rb_define_alloc_func to associate the
writer_allocate function with the allocate method.
writer_allocate creates a genxWriter object with
genxNew and turns it into a Ruby object via the
Data_Wrap_Struct macro. Data_Wrap_Struct simply
takes a VALUE representing the class of the new object (passed as
an argument to writer_allocate), two function pointers used for
Ruby's mark-and-sweep garbage collection, and a pointer to the underlying C-level data structure--in this case the genxWriter itself--and
returns a new VALUE that refers to the object, which is simply a
thin wrapper around the C-level pointer. Finally, the code has the mark
function for the object, writer_mark, which actually does nothing,
and the destructor, writer_free, which calls
genxDispose to clean up the genxWriter allocated in
writer_allocate. When wrapping a more complicated structure that
includes references to other Ruby-level objects, the mark function must call
rb_gc_mark on each of them to tell Ruby when nothing references them any longer and they are ready for garbage collection.
One thing to note about genx4r.c is that the only function
accessible outside that file is Init_genx4r. Everything else is
static, which means that it won't leak out into the global
namespace and cause linkage errors if some other part of the program happens to
use the same function or variable name.
It's time to take a quick jaunt through irb to confirm that
it's possible to create an instance of the new class:
$ irb
irb(main):001:0> require 'genx4r'
=> true
irb(main):002:0> w = GenX::Writer.new
=> #<GenX::Writer:0x321f84>
irb(main):003:0>
Sure enough, it's an instance of our new GenX::Writer class.
Adding a few methods will make it actually useful!
|
Now it's time to add some actual functionality to the new class, which means
deciding exactly how it should work. The basic usage pattern for GenX itself is
simple: create a document, then create a variety of elements, possibly mixed
with character data, and finally close the document. In the simplest terms, this
boils down to five methods: GenX::Writer#begin_document,
GenX::Writer#end_document,
GenX::Writer#begin_element, GenX::Writer#end_element,
and GenX::Writer#text. Because you can't do anything without a
document, I'll start with begin_document and
end_document.
The underlying GenX library has two functions for starting a new document.
The first, genxStartDocFile, takes a FILE * argument
to which to write the contents of the document. I want something a bit more
generic than that, because there's no certainty that the user wants to send the
output to a file. It might be nice to be able to write the XML to a network
socket, or a string, or any number of other things. Fortunately GenX also
provides the genxStartDocSender function, which requires a
genxSender structure, containing pointers to send,
sendBounded, and flush functions. The extension
needs to implement those functions. Here is writer_send, the
send function:
static genxStatus
writer_send (void *baton, constUtf8 s)
{
VALUE file = (VALUE) baton;
VALUE ary;
if (! rb_respond_to (file, rb_intern ("<<")))
rb_raise (rb_eRuntimeError, "target must respond to '<<'");
ary = rb_ary_new2 (2);
rb_ary_store (ary, 0, file);
rb_ary_store (ary, 1, rb_str_new2 (s));
if (rb_rescue (call_write, ary, handle_exception, Qnil) == Qfalse)
return GENX_IO_ERROR;
else
return GENX_SUCCESS;
}
writer_send takes two arguments. First is a void *
that points to the "user data" associated with the genxWriter. In
that case, the pointer holds a VALUE that refers to the object
into which to send data. This could be a File object, a
String, or anything else—what's important is that it
responds to the << method, which the code uses to write the
data from GenX. The second argument is a constUtf8,
which is GenX's typedef for a pointer to the start of a
null-terminated Utf8 string containing the data to write out.
The code first checks whether the object it received actually responds to the
<< method. If it doesn't, then it can't do much else, so it
raises an exception via rb_raise. If it passes that check, it
creates a Ruby array to hold the object into which to write the data and a Ruby
string to hold the data to write. The array exists to make it easier to pass
them to call_write, a helper function that does the writing by way of
the rb_rescue function. Here's the implementation of
call_write:
static VALUE
call_write (VALUE ary)
{
rb_funcall (rb_ary_entry (ary, 0),
rb_intern ("<<"),
1,
rb_ary_entry (ary, 1));
}
All this does is use rb_funcall to call the
<< method on the file object, extracted from the array using
the rb_ary_entry. The << method takes a single
argument (thus the 1): the string built up in
writer_send, also extracted from the array.
If all this code needs to do is call a function, why not do it directly in
writer_send, thus avoiding the rb_rescue gymnastics?
In case the << method throws an exception,
rb_rescue calls the handle_exception function passed
to it. It passes the return value of handle_exception back to the
caller through rb_rescue. In this case,
handle_exception returns Qfalse so that
writer_send knows to return GENX_IO_ERROR to its
caller. Here's the (trivial) implementation of
handle_exception:
static VALUE
handle_exception (VALUE unused)
{
return Qfalse;
}
All right, that's the implementation of send. The next function
is sendBounded. This is pretty much the same thing; it just needs
to create the string it passes in to call_write based on a start
pointer and an end pointer instead of a null-terminated string. The
implementation looks like this:
static genxStatus
writer_send_bounded (void *baton, constUtf8 start, constUtf8 end)
{
VALUE file = (VALUE) baton;
VALUE ary;
if (! rb_respond_to (file, rb_intern ("<<")))
rb_raise (rb_eRuntimeError, "target must respond to '<<'");
ary = rb_ary_new2 (2);
rb_ary_store (ary, 0, file);
rb_ary_store (ary, 1, rb_str_new (start, end - start));
if (rb_rescue (call_write, ary, handle_exception, Qnil) == Qfalse)
return GENX_IO_ERROR;
else
return GENX_SUCCESS;
}
As you can see, the only difference here is in calling
rb_str_new instead of rb_str_new2. This takes a
pointer and a length, calculated based on the given start and end pointers.
Finally, this code needs a flush function. Here's
writer_flush and its helper function call_flush:
static VALUE
call_flush (VALUE file)
{
rb_funcall (file, rb_intern ("flush"), 0);
return Qtrue;
}
static genxStatus
writer_flush (void *baton)
{
VALUE file = (VALUE) baton;
/* if we can't flush, just let it go... */
if (! rb_respond_to (file, rb_intern ("flush")))
return GENX_SUCCESS;
if (rb_rescue (call_flush, file, handle_exception, Qnil) == Qfalse)
return GENX_IO_ERROR;
else
return GENX_SUCCESS;
}
This is rather similar to the send and sendBounded
functions, so let's consider only the differences. First of all, if the object
to which to write the data doesn't respond to the flush method,
the code returns success, assuming that it's holding the data in memory or
something else where flush is not applicable. The only other difference is that
instead of the << method, it calls the flush
method on the object.
With all three helper functions, it's time to write the
writer_begin_document function itself. That'll finally make it
possible to start a new document with the GenX::Writer object.
static genxSender writer_sender = { writer_send,
writer_send_bounded,
writer_flush };
static VALUE
writer_begin_document (VALUE self, VALUE file)
{
genxWriter w;
Data_Get_Struct (self, struct genxWriter_rec, w);
if (! rb_respond_to (file, rb_intern ("<<")))
rb_raise (rb_eRuntimeError, "target must respond to '<<'");
genxSetUserData(w, (void *) file);
GENX4R_ERR (genxStartDocSender (w, &writer_sender), w);
return Qnil;
}
The first thing to do is to unwrap the contents of the self
object. Recall the earlier use of Data_Wrap_Struct to turn a
struct genxWriter_rec into our object. This code uses its
counterpart, Data_Get_Struct, to pull it back out again. For
paranoia's sake, it confirms that the file it received responds to
the << method, because it will call that method later on.
Next, calling genxSetUserData makes file useful as
the "user data," so that when the time comes to call the sender functions, they
can call the appropriate methods on it. Finally, the code calls
genxStartDocSender, passing it the writer and a pointer to the
genxSender so it knows which functions it should call later on.
The object is now ready to do some real work.
There is something a little strange about the call to
genxStartDocSender function though, being wrapped up in the
GENX_ERR macro. This is a helper to let you avoid writing all sorts of
boilerplate error-handling code in a lot of different places. Here's the
definition:
#define GENX4R_ERR(expr, w) \
do { \
genxStatus genx4r__status = (expr); \
if (genx4r__status) \
rb_raise (rb_cGenXException, "%s", genxLastErrorMessage (w)); \
} while (0)
All it does is declare a genxStatus to hold the return value of
the expression it's calling. If that status is nonzero (meaning anything other
than GENX_SUCCESS), it raises an exception via
rb_raise. The exception is of type
rb_cGenXException and holds the result of
genxLastErrorMessage, making it possible to present a reasonable
error to the caller. A do { } while loop (with no trailing
semicolon) wraps the entire expression, making it possible to treat it just
like a regular C statement.
This is the first appearance of rb_cGenXException. Where did
it come from? Just like the rb_cGenXWriter VALUE
used to hold the reference to the GenX::Writer class,
rb_cGenXException is a VALUE holding a reference to
the GenX::Exception class. Its definition is in
Init_genx4r:
rb_cGenXException = rb_define_class_under (rb_mGenX,
"Exception",
rb_eStandardError);
This creates a new class within the rb_mGenX module named
Exception; the class inherits from the rb_eStandardError class, better known in Rubyland as
StandardError.
The final step here is to add a call in rb_define_method to the
init function in order to hook up writer_begin_document to the
GenX::Writer class. It looks like this:
rb_define_method (rb_cGenXWriter,
"begin_document",
writer_begin_document,
1);
Calling the begin_document method on an instance of the
GenX::Writer class causes Ruby to call
writer_begin_document. The 1 indicates that
begin_document takes a single VALUE as its
argument.
After starting a document with Genx::Writer#begin_document,
there must be some way to end it, by way of the corresponding
Genx::Writer#end_document method. This is, fortunately, much
simpler than begin_document. All it needs to do is call
genxEndDocument. The actual implementation looks like this:
static VALUE
writer_end_document (VALUE self)
{
genxWriter w;
Data_Get_Struct (self, struct genxWriter_rec, w);
GENX4R_ERR (genxEndDocument (w), w);
return Qnil;
}
As you can see, this is basically boilerplate stuff. It pulls the writer out of
our object using Data_Get_Struct, calls genxEndDocument
on it (wrapped up in a GENX_ERR macro to handle the error
checking), and returns Qnil, which in Ruby terms returns
nil to the caller.
Then it hooks up the method via the appropriate
rb_define_method call in Init_genx4r:
rb_define_method (rb_cGenXWriter,
"end_document",
writer_end_document,
0);
This is exactly like hooking up the begin_document method,
except that it takes no arguments instead of one.
|
It has taken an awful lot of trouble to put the begin_document
and end_document methods in place. Now users can start and end
documents--but in order for this to be of much use, they'll want to put some
content in the document. Because this is XML, and XML documents all start with a
root element, that means the next methods to implement are
GenX::Writer#begin_element and
GenX::Writer#end_element. The obvious place to start is
GenX::Writer#begin_element.
Genx::Writer#begin_element is a thin wrapper around the
genxStartElementLiteral function. It's really similar to the
methods already shown. Here's the implementation:
static VALUE
writer_begin_element (int argc, VALUE *argv, VALUE self)
{
genxWriter w;
VALUE xmlns, name;
switch (argc)
{
case 1:
xmlns = 0;
name = argv[0];
break;
case 2:
xmlns = argv[0];
Check_Type (xmlns, T_STRING);
name = argv[1];
break;
default:
rb_raise (rb_eRuntimeError, "invalid arguments");
}
Check_Type (name, T_STRING);
Data_Get_Struct (self, struct genxWriter_rec, w);
GENX4R_ERR (genxStartElementLiteral
(w,
xmlns ? (constUtf8) RSTRING (xmlns)->ptr : NULL,
(constUtf8) RSTRING (name)->ptr), w);
return Qnil;
}
A few things here haven't appeared before. First of all, this
method takes a variable number of arguments. Most of the code in this function
goes to figuring out how many arguments it received and setting things up as
appropriate. The way Ruby lets you do this at the C level is that the
underlying C function takes as arguments an integer that holds the number of
arguments passed, a pointer to an array of VALUEs that contains
each of the arguments, and a VALUE that holds the invoking
object.
If it receives one argument, it uses that as the name of the element.
If it receives two arguments, then the first is the namespace and the second
is the name. Given an xmlns argument, the code verifies that it
is a String using the Check_Type macro with the
T_STRING constant. The same check occurs for the element's
name. Then, as usual, it pulls the genxWriter out of
self and finally calls the underlying
genxStartElementLiteral function, passing in the namespace if
provided and valid, and a NULL otherwise. When passing the
namespace and element name, note that the code uses the RSTRING
macro to cast the VALUE to the underlying string data structure
before accessing the C-string pointer via the ptr field in that
structure.
Once again, Init_genx4r needs more code to hook up this
method:
rb_define_method (rb_cGenXWriter,
"begin_element",
writer_begin_element,
-1);
Notice the -1 that tells Ruby to call this method via the
count/array/object style of argument passing.
Now that there's a way to start an element, there must be a way to end it.
That's the purpose of the GenX::Writer#end_element method.
As you might have guessed, GenX::Writer#end_element is very
similar to GenX::Writer#end_document. Here's the
implementation:
static VALUE
writer_end_element (VALUE self)
{
genxWriter w;
Data_Get_Struct (self, struct genxWriter_rec, w);
GENX4R_ERR (genxEndElement (w), w);
return Qnil;
}
All it does is pull out the writer and call genxEndElement on
it. GenX does the rest. As usual, it takes one call in Init_genx4r
to hook up the method:
rb_define_method (rb_cGenXWriter,
"end_element",
writer_end_element,
0);
Now GenX4r can actually produce XML. Jump into
irb and try it out.
$ irb
irb(main):001:0> require 'genx4r'
=> true
irb(main):002:0> w = GenX::Writer.new
=> #<GenX::Writer:0x321f0c>
irb(main):003:0> s = ''
=> ""
irb(main):004:0> w.begin_document(s)
=> nil
irb(main):005:0> w.begin_element("foo")
=> nil
irb(main):006:0> w.end_element
=> nil
irb(main):007:0> w.end_document
=> nil
irb(main):008:0> s
=> "<foo></foo>"
irb(main):009:0>
There you have it! The extension actually produced some XML output! Of
course, most XML needs some textual content within at least some of the tags.
Making that work means implementing GenX::Writer#text, a wrapper
around genxAddText.
After everything implemented so far, GenX::Writer#text doesn't
have anything all that new to it. Take a look:
static VALUE
writer_text (VALUE self, VALUE text)
{
genxWriter w;
Check_Type (text, T_STRING);
Data_Get_Struct (self, struct genxWriter_rec, w);
GENX4R_ERR (genxAddText (w, (constUtf8) RSTRING (text)->ptr), w);
return Qnil;
}
There are the usual hoops to access the genxWriter and then a
call to pass the text through to genxAddText. Here's code to hook
up the method in Init_genx4r.
rb_define_method (rb_cGenXWriter,
"text",
writer_text,
1);
There you have it, a functionally complete wrapper. Try it out in
irb to prove it.
$ irb
irb(main):001:0> require 'genx4r'
=> true
irb(main):002:0> w = GenX::Writer.new
=> #<GenX::Writer:0x321f0c>
irb(main):003:0> s = ''
=> ""
irb(main):004:0> w.begin_document(s)
=> nil
irb(main):005:0> w.begin_element("foo")
=> nil
irb(main):006:0> w.text("bar")
=> nil
irb(main):007:0> w.end_element
=> nil
irb(main):008:0> w.end_document
=> nil
irb(main):009:0> s
=> "<foo>bar</foo>"
irb(main):010:0>
With the combination of elements and text, you can now start using
GenX4r for some nontrivial tasks. Before that, I'd like to write
some tests to verify that everything works now and that it will continue to
work as I make changes in the future.
In Ruby, the accepted way to write unit tests is to use the
Test::Unit framework. This is a standard unit test framework,
written along the lines of the popular JUnit package. To use it, subclass the
Test::Unit::TestCase class and implement your tests as methods
that are named test_something (where the
something part changes for each test). Inside the tests, use the
assert method to indicate what conditions need to be true for the
tests to pass. Here's a simple test case to start:
require 'test/unit'
require 'genx4r'
class BasicsTest < Test::Unit::TestCase
def test_element
w = GenX::Writer.new
s = ''
w.begin_document(s)
w.begin_element('foo')
w.text('bar')
w.end_element
w.end_document
assert s == '<foo>bar</foo>'
end
end
Run the tests by running that file. You should receive output similar to the following:
$ ruby test.rb
Loaded suite test
Started
.
Finished in 0.005774 seconds.
1 tests, 1 assertions, 0 failures, 0 errors
The line with the single dot on it is where you see the output for the
tests. Each passing test prints a . whereas failing tests print an
F. To add more tests, fill in more test methods. They will run
automatically when you run the file.
All right, now there's a working module and a test suite to make sure it
keeps on working. I'm all set to release this new toy to the unsuspecting
masses out there on the Internet, right? Not quite. Although the API works, it's
not ideal. You have to remember to call the
GenX::Writer#end_element and
GenX::Writer#end_document methods at exactly the right times;
otherwise you'll either mess up the output (if elements nest
incorrectly) or even possibly throw an exception because you call underlying
GenX functions out of order. Remember that GenX is
big on enforcing correctness, so if you screw up, it will tell you about it.
It would be really nice to arrange for the module to call these end methods at the appropriate times. Fortunately, Ruby has a way to do that: blocks.
A block in Ruby is a chunk of code passed to a method as one of its
arguments. The method can then call the yield method to invoke the
block whenever it wants. The syntax looks like this:
def takes_a_block(&block)
puts "before yield"
yield
puts "after yield"
end
takes_a_block do
puts "in the block"
end
Running this code produces the following output:
$ ruby blocks-example.rb
before yield
in the block
after yield
Note that Ruby allows braces as block delimiters instead of do
and end, in which case the method call could have looked like
takes_a_block { puts "in the block" }. Both ways are valid.
Which one you use is mostly just a question of style.
Using blocks to indicate the beginning and end of an element in the XML to
generate solves the API problem perfectly. Here's how to implement this with
a new GenX::Writer#element method defined at the C level.
static VALUE
writer_element (int argc, VALUE *argv, VALUE self)
{
writer_begin_element (argc, argv, self);
if (rb_block_given_p ())
{
rb_yield (Qnil);
writer_end_element (self);
return Qnil;
}
else
rb_raise (rb_eRuntimeError, "must be called with a block");
}
All of this is merely a new method that calls the begin_element method and
then invokes the block it received (or throws an exception if it didn't receive
one), then calls end_element. In order to nest elements or put
text inside them, the passed-in block needs to contain the code to create that
content. There are two new C-level functions here,
rb_block_given_p, the predicate that asks "Was I given a block?" and
rb_yield, which invokes the block. Because there's nothing else
to pass to the block, the code passes Qnil.
As usual, the code to hook up this new method in Init_genx4r looks like:
rb_define_method (rb_cGenXWriter,
"element",
writer_element,
-1);
With that in place, using the new method like this:
w = GenX::Writer.new
s = ''
w.begin_document(s)
w.element('foo') do
w.element('bar') do
w.text('baz')
end
end
w.end_document
puts s
produces output of this:
<foo><bar>baz</bar></foo>
Isn't that a much nicer API? Instead of having to remember to handle the
nesting of elements manually, users can encode it directly into their program,
making it much more difficult to do incorrectly. Note that the same technique
can easily apply to the GenX::Writer#begin_document and
GenX::Writer#end_document methods.
This whole idea started off as something of an experiment. Is it really as easy as I thought it would be to wrap a C library in Ruby? I don't know about you, but I think it was a success. In fewer than 300 lines of C, I've provided users with access to a useful subset of the GenX library's functionality. If I were going to implement this code directly in Ruby, it would be longer and most likely buggier, simply because the C version has been debugged already and our hypothetical Ruby version has not.
That said, GenX4r is still somewhat incomplete. The
begin_document and end_document methods still need
block-based cover method wrappers, and for efficiency I also want to provide
users the ability to predeclare namespaces and elements, to avoid having
to validate them each time they're used. Plus, I'm a reasonably new Ruby
hacker, so it's not out of the realm of possibility that there are bugs in the
wrapper. Even so, I think this is a reasonable proof of concept. The
additional work I've done on GenX4r that isn't documented here indicates to me
that it is a success. On the strength of this experience, I have no trouble
recommending Ruby as a convenient scripting language to wrap around
libraries written in C.
For the record, the current version of GenX4r, which includes all this hypothetical functionality in one form or another, constitutes only 584 lines of C. If you're interested in using it or helping me develop it further, please grab the latest version from the GenX4r home page.
Garrett Rooney is a software developer at FactSet Research Systems, where he works on real-time market data.
Return to ONLamp.com.
Copyright © 2007 O'Reilly Media, Inc.