oreilly.comSafari Books Online.Conferences.


Extending Ruby with C

by Garrett Rooney

Ruby, if you've never heard of it, is an object-oriented scripting language, similar in many ways to Perl and Python. It originates from Japan and is young, as far as programming languages go. There are many really good reasons you might want to use the Ruby language; I'm not going to go into all of them here, but the one at the core of this article is the ease with which you can write Ruby extensions in C.

I'm a big fan of the so-called agile programming languages. I think they have a huge advantage over more traditional languages like C and C++. They also have some drawbacks, among the largest being that there's an awful lot of existing code written in C and C++. It's hard to sell people on moving to something new if they have to leave all their old toys behind.

The standard response to these sort of arguments is that you can easily write an extension that bridges the gap between your old C code and your new Perl or Python, or whatever agile language is hot this week, code. Unfortunately, I've generally found that the APIs for bridging the gap between Perl and C are either cryptic (XS) or fragile (Inline::C). While Python is better in some ways, I still find its C API rather difficult to read. Tools such as SWIG can help alleviate this problem, but you still need to write a bunch of glue code to bridge the gap between the high-level agile languages and the low-level C code.

When I first looked at doing the same kind of thing for Ruby, a whole new world opened up. The APIs are simple, to the point where I was up and running in minutes rather than hours. All you need to know to start is in the README.EXT file in the top level of the Ruby source tree. If you need help with something that isn't documented there, you can't ask for a clearer example than the Ruby source code itself. In short, I was just aching for a test case, some C code I could wrap up in a Ruby extension to prove how simple it is to make something that's easy to use. For my test case, I chose the GenX library.


Related Reading

Programming Ruby
The Pragmatic Programmer's Guide, Second Edition
By Dave Thomas

GenX is a simple C library for generating correct, canonical XML. It verifies that its data is valid UTF-8 and the structure of the XML document being generated is valid, and it forces you to use canonical XML--that may not mean much to you right now, but it can be significant if you need to compare two XML documents to determine their equivalence. Tim Bray wrote GenX and hosts it at GenxStatus. GenX originally attracted me because it provides a way to avoid problems related to invalid XML, which I've encountered in my own work. In addition to its usefulness, it's also perfect for an example of how to embed a C library in Ruby because it's very small, self contained, and has a well-defined API we can wrap up in Ruby without too much trouble.


At this point it's worth asking whether a Ruby extension is really the best way to make this kind of functionality available.

Using an extension means that users need to install a binary distribution precompiled for their versions of Ruby and their operating systems, or to build it themselves, which means they need access to a C compiler. Additionally, extending Ruby via C has its own set of dangers. If you screw something up in a standard Ruby module, pretty much the worst you can do is cause an exception to be thrown. It's possible for users to recover from this if they're paranoid enough about catching exceptions and structure their code correctly. In a C-based extension, an error can corrupt memory or cause a segmentation fault or any number of other problems from which recovery is difficult, and all of which have the chance to crash the underlying Ruby interpreter.

That said, in this particular case I think providing direct access to the underlying GenX library via a C extension is the way to go. The GenX library is available right now, it works, and it does its job in a very efficient manner. There's no reason to duplicate functionality unnecessarily. Even if I did rewrite this in pure Ruby, all I am likely to accomplish is slowing things down. Plus, GenX is exceptionally self contained; while using the library does require that users either use a precompiled extension or possess a C compiler, it at least doesn't bring in any other third-party requirements. Finally, the GenX API is quite straightforward. It's reasonable to assume that we'll be able to implement this extension without undue risk of crashing our Ruby interpreter due to bugs in our code.

Some Basic Functionality

The first step in writing a Ruby extension is to create something that compiles and runs. That means writing an extconf.rb file that tells Ruby how to compile and link your extension, and then writing the bare-bones C file that makes up the extension. With these two steps completed, you'll have a Ruby module that you can require and a new class you can instantiate, albeit not a very useful one because it won't actually have any methods.

The extconf.rb file is a short Ruby program that makes use of the mkmf module to build a simple makefile, which you use to build your extension. There's a fair amount of specialized functionality you can put in your extconf.rb file, but for our purposes the bare minimum will do. Here's the entirety of my extconf.rb file:

require 'mkmf'



This tells Ruby to use all of the .c files in the current working directory to build a extension named genx4r, and that it should write out a makefile to compile and link it. If you copy all the .c and .h files from the GenX tarball into the current directory, you can run ruby extconf.rb && make and then have a Ruby extension sitting there just waiting for you to require it in our script. Here's the process:

$ ruby extconf.rb 
creating Makefile
$ make
gcc -fno-common   -g -Os -pipe -no-cpp-precomp -fno-common -DHAVE_INTTYPES_H
-pipe -pipe  -I. -I/usr/lib/ruby/1.6/powerpc-darwin7.0 -I.    -c -o charProps.o
gcc -fno-common   -g -Os -pipe -no-cpp-precomp -fno-common -DHAVE_INTTYPES_H
-pipe -pipe  -I. -I/usr/lib/ruby/1.6/powerpc-darwin7.0 -I.    -c -o genx.o
cc -fno-common   -g -Os -pipe -no-cpp-precomp -fno-common -DHAVE_INTTYPES_H
-pipe -pipe  -dynamic -bundle -undefined suppress -flat_namespace
-L/usr/lib/ruby/1.6/powerpc-darwin7.0 -L/usr/lib  -o genx4r.bundle charProps.o
genx.o   -ldl -lobjc 
$ ls
Makefile        charProps.o     genx.c          genx.o
charProps.c     extconf.rb      genx.h          genx4r.bundle*
$ irb
irb(main):001:0> require 'genx4r'
LoadError: Failed to lookup Init function ./genx4r.bundle
        from (irb):1:in `require'
        from (irb):1

OK, so that sort of works.... There's a Ruby extension, but trying to require it from inside Ruby only produces an error. That's because none of the .c files defined an Init function. When Ruby tries to load an extension, the first thing it does is look for a function named Init_extname, where extname is the name of the extension. Because that function doesn't exist, Ruby obviously can't find it and throws a LoadError exception.

The next step is to implement Init_genx4r to allow the extension to load successfully. The bare minimum necessary is simply an empty function named Init_genx4r that takes no arguments and returns nothing. I like that. Here are the current contents of the genx4r.c file:

#include "ruby.h"

  /* nothing here yet */

Rerun extconf.rb and make. When you try to load the genx4r module with require, you should have better results:

$ irb
irb(main):001:0> require 'genx4r'
=> true

Creating a Class

The extension loads, but it still doesn't actually do anything. It needs definitions for the classes that make up the interface to the GenX library. For now, I'll define one top-level Ruby module named GenX and a single class, Writer, that lives in it. That class is simply a thin wrapper around the C-level genxWriter type. Here's the next iteration of genx4r.c:

#include "ruby.h"

#include "genx.h"

static VALUE rb_mGenX;
static VALUE rb_cGenXWriter;

static void
writer_mark (genxWriter w)

static void
writer_free (genxWriter w)
  genxDispose (w);

static VALUE
writer_allocate (VALUE klass)
  genxWriter writer = genxNew (NULL, NULL, NULL);

  return Data_Wrap_Struct (klass, writer_mark, writer_free, writer);

Init_genx4r ()
  rb_mGenX = rb_define_module ("GenX");

  rb_cGenXWriter = rb_define_class_under (rb_mGenX, "Writer", rb_cObject);

  /* NOTE: this only works in ruby 1.8.x.  for ruby 1.6.x you instead define
   *       a 'new' method, which does much the same thing as this. */
  rb_define_alloc_func (rb_cGenXWriter, writer_allocate);

That's a lot of new code. First comes the #include of genx.h, because it needs to use functions defined by GenX. The two VALUE variables represent the module and the class. Each object in Ruby (and remember, everything in Ruby is an object) has a VALUE; think of it as a reference to the object. The beginning of Init_genx4r initializes these variables by calling rb_define_module to create the GenX module and rb_define_class_under to define the Writer class.

Next, Ruby needs to know how to allocate the guts of the GenX::Writer object. That's where allocate comes in, using rb_define_alloc_func to associate the writer_allocate function with the allocate method. writer_allocate creates a genxWriter object with genxNew and turns it into a Ruby object via the Data_Wrap_Struct macro. Data_Wrap_Struct simply takes a VALUE representing the class of the new object (passed as an argument to writer_allocate), two function pointers used for Ruby's mark-and-sweep garbage collection, and a pointer to the underlying C-level data structure--in this case the genxWriter itself--and returns a new VALUE that refers to the object, which is simply a thin wrapper around the C-level pointer. Finally, the code has the mark function for the object, writer_mark, which actually does nothing, and the destructor, writer_free, which calls genxDispose to clean up the genxWriter allocated in writer_allocate. When wrapping a more complicated structure that includes references to other Ruby-level objects, the mark function must call rb_gc_mark on each of them to tell Ruby when nothing references them any longer and they are ready for garbage collection.

One thing to note about genx4r.c is that the only function accessible outside that file is Init_genx4r. Everything else is static, which means that it won't leak out into the global namespace and cause linkage errors if some other part of the program happens to use the same function or variable name.

It's time to take a quick jaunt through irb to confirm that it's possible to create an instance of the new class:

$ irb
irb(main):001:0> require 'genx4r'
=> true
irb(main):002:0> w =
=> #<GenX::Writer:0x321f84>

Sure enough, it's an instance of our new GenX::Writer class. Adding a few methods will make it actually useful!

Pages: 1, 2, 3

Next Pagearrow

Sponsored by: