ONLamp.com
oreilly.comSafari Books Online.Conferences.

advertisement


Using Design by Contract in C

by Charlie Mills
10/28/2004

Last spring I began a coding project that for various reasons I needed to write mostly in C. I wanted to do as much up-front work as possible to minimize bugs and debugging time later. At about the same time, I picked up The Pragmatic Programmer. This book introduces many important techniques for minimizing bugs and improving the quality of your code. One technique that I found particularly exciting was Design by Contract.

Related Reading

Secure Programming Cookbook for C and C++
Recipes for Cryptography, Authentication, Input Validation & More
By John Viega, Matt Messier

Design by Contract

Chances are you've heard of Design by Contract. If you're coding in Eiffel you're probably already using it. If you haven't heard of Design by Contract, then you're like I was six months ago. To quote Bertrand Meyer, the key concept of Design by Contract is "viewing the relationship between a class and its clients as a formal agreement, expressing each party's right and obligations" [3]. Because this discussion will segue into C coding and C doesn't have classes, we'll talk about functions instead.

Preconditions and Postconditions

To use a function (written by you or someone else), you need to understand the range of arguments allowed and the effects of calling the function. This is where contracts come into play. Most languages that support Design by Contract provide two types of statements to express the obligations of the function caller and the callee: preconditions and postconditions. The caller must meet all the preconditions of the function being called, and the callee or the function must meet its own postconditions. The failure of either party to live up to the terms of the contract is a bug in the software [3].

The quintessential examples of preconditions and postconditions are those of the square root function for positive real numbers. The precondition of the square root function is that the argument must be a real number greater than or equal to zero. The postcondition is that the return value is a real number such that its square must equal the argument (ignoring round-off issues).

/* square root function for positive real numbers */
double sqrt(double x)
{
  /* ... */
}

When someone calls sqrt() with a value of x that is less than zero, sqrt() cannot meet its postcondition. Again, this contractual breach indicates a bug in the software. The best course of action usually is to abort [3].

Invariants

The third type of statement Design by Contract places at our disposal is the invariant. Invariants describe conditions which must always hold for an object (a structure or type in C). For example, once initialized, any instance of the dynamic array type DArray_T, shown below, should have the properties length >= 0 and capacity >= length. These properties are invariants.

darray.c:

struct DArray_T {
  void  **ptr;
  long    length;
  long    capacity;
};

darray.h:

typedef struct DArray_T *DArray_T;

In an object-oriented setting, calling a method on an object causes the object's invariants to be checked. Checking invariants happens before checking preconditions and after postconditions. So whenever someone calls the function

void *DArray_get(DArray_T ary, long index);

the following happens:

  1. The invariants of DArray_T are checked for ary.
  2. The preconditions of DArray_get() are checked.
  3. DArray_get() executes.
  4. The postconditions of DArray_get() are checked.
  5. The invariants of DArray_T are checked for ary.

You're probably wondering how I went from discussing methods to the C function DArray_get(). As you know, C doesn't truly have objects or methods on objects. Instead it has structures, types, and functions. However, invariants apply just as well in C as they do in an object-oriented language. If a function has write access to a structure or type, then DBC should check the invariants of that structure or type before and after the function executes. The function DArray_get() has write access to all its parameters; therefore, the invariants of all its parameters should be checked. Obviously, long won't have any invariants, but DArray_T will.

The const qualifier can specify types as read only, but C's opaque types are a more powerful mechanism. Opaque types are pointers that client code cannot dereference. DArray_T as defined above is an opaque type. Defining struct DArray_T in darray.c and not darray.h hides its fields from clients. Therefore, only the functions defined in darray.c have access to the fields of struct DArray_T. Only functions in darray.c need to check the invariants of DArray_T, eliminating the need to check the invariants of DArray_T (or any opaque type) in client code. (However, client code still needs to check the invariants of nonopaque types.)

Why Not Assert?

C doesn't provide any Design by Contract language features, but it does provide assert(). In the sqrt() example, an assert statement works fine for checking the precondition

double sqrt(double x)
{
  assert(x >= 0);
  /* ... */
}

Why not just stick with assert() and forget all this stuff about Design by Contract? Well, assert() does not quite live up to Design by Contract, for the following reasons:

  • Side effects--Contract checking should not produce any side effects. The software should behave the same with or without contract checking in place. Unfortunately, adding assert() or any other type of condition requires more coding, and more coding introduces the possibility of more errors. A statement such as

    assert(size >= 0 && size = MAX_BUFFER_SIZE);

    is valid C code, but obviously it does not produce the desired effect.

  • Lack of robustness--It takes significant extra coding to check postconditions and invariants using assert().
  • Lack of expressivity--The facilities for expressing contracts should be rich enough to express a specification [3]. In order to be expressive, assert statements must use C control structures.
  • Mixing application code and contracts--Contracts' specifications should go in the function definition, but not as a part of the function definition. Assert statements tend to intermix with application code, mixing the specification and the implementation.
  • Redundancy--It is very common to have a structure or type for which certain conditions must always hold. Ensuring these invariants requires checking them throughout your code; using assert() for this creates lots of duplicate code.

Because assert() can appear in arbitrary locations, assert statements serve better as sanity checks in confusing parts of your code or for checking loop invariants. However, if you have a section of code littered with asserts, it is usually a good sign that you need to do some refactoring [2].

Design by Contract Meets C

Not satisfied with assert() and excited about Design by Contract, I set out to create my own Design by Contract implementation for C. After looking at some of the solutions available for Java [1] I decided to use a subset of the Object Constraint Language to express contracts [4]. Using Ruby and Racc, I created Design by Contract for C, a code generator that turns contracts embedded in C comments into C code to check the contracts.

The front end to this code generator is the dbcparse.rb Ruby script. Running ./dbcparse.rb -h will print instructions on how to use the script. For starters, ./dbcparse.rb darray.c will print to standard out darray.c with all contracts expanded into C code.

Contracts

Below is the contract for and implementation of DArray_get():

darray.c:

/**
  pre: index >= 0
  pre: index < ary->length
  post: return == ary->ptr[index]
 */
void *DArray_get(DArray_T ary, long index)
{
  return ary->ptr[index];
}

Opening the comment with /** specifies a contract. Because the DArray_get() function directly follows the contract, the contract applies to the DArray_get() function (also called the context of the contract). Inside the comment, we have pre: and post: tags, which specify a precondition and postcondition, respectively. The return keyword is simply the value that DArray_get() returns.

This function is pretty simple; its preconditions and postconditions describe it nicely. Notice that the postcondition relies on the preconditions being true and that the function should not execute if the preconditions fail.

When defining a contract, each condition must be a Boolean expression; assignment is not allowed. The normal C comparison operators (<, >, <=, >=, ==, and !=) all work. Instead of the &&, ||, and ! operators, we have and, or, and not, respectively. C control structures (if, else, for, do, while, ?:) do not appear; instead there is an implies operator and two iterating operators, forall and exists. The iterating operators check a condition over a range; a forall statement is true if its condition holds true over the entire range, and an exists statement is true if its condition is true at least once in the range. The syntax is as follows:

forall ( declaration in range | expression ) exists ( declaration in range | expression )

What do I mean by range? Many languages support a range idiom; I borrowed this syntax from Perl. (Other languages, such as Ruby, also use this syntax.) Ranges come in two forms: inclusive and exclusive. An inclusive range is start..end, and an exclusive range is start...end. All ranges include the start, but only inclusive ranges include the end.

Here is an example of the contract for a function that creates a shallow copy of a DArray_T:

darray.c:

/**
  post: return != NULL implies return->length == ary->length and
    forall (long i in 0...ary->length | return->ptr[i] == ary->ptr[i])
 */
DArray_T DArray_copy(DArray_T ary)
{
  /* ... */
}

In words, this contract says that if the function returns a non-null DArray_T, then the returned DArray_T will have the same length and elements as ary.

You're probably wondering how this crazy syntax works. Let me explain. First, the expression return != NULL is checked. If it is false, the condition is true and we are done. If not, we must check what return != NULL implies. According to the condition, it implies:

return->length == ary->length and
  forall (long i in 0...ary->length | return->ptr[i] == ary->ptr[i])

The expression return->length == ary->length is self-explanatory. The forall statement is more complex. Remember that the syntax of the forall statement is

forall ( declaration in range | expression )

long i is the declaration, 0...ary->length is the range being iterated over, and return->ptr[i] == ary->ptr[i] is the expression to check at each iteration.

I mentioned before that all instances of DArray_T should have the properties length >= 0 and capacity >= length. How do we know that the DArray_T being passed to DArray_get() has these properties? By specifying them as invariants.

darray.c:

/**
  context DArray_T
  inv: self != NULL
  inv: self->length >= 0
  inv: self->capacity >= self->length
  inv: (self->capacity == 0 and self->ptr == NULL)
    or (self->capacity > 0 and self->ptr != NULL)
 */
struct DArray_T {
  void  **ptr;
  long    length;
  long    capacity;
};

Now, calling every function in darray.c with a parameter of type DArray_T (the context) will check these invariants before and after the function executes. Note that defining invariants in a module (.c file) makes the invariants local to that module (as in darray.c), and defining invariants in a header file makes them global, causing the invariants to be checked for all applicable functions.

Syntax Review

Contract specifications occur in comments beginning with /**. If there is no context in the contract, the context is the next C statement following the contract. Below is the syntax for expressing contracts; statements in angle brackets are optional:

  • contract
    • <context> condition(s)
  • context
    • context declaration
  • condition
    • pre <message>: expression
    • post <message>: expression
    • inv <message>: expression
  • expression
    • valid C identifier, macro, or function call
    • ( expression )
    • forall ( declaration in range | expression )
    • exists ( declaration in range | expression )
    • not expression
    • expression <, >, <=, >=, ==, != expression
    • expression in range
    • expression and expression
    • expression or expression
    • expression implies expression
  • range
    • start..end
    • start...end

The comparison operators (<, >, <=, >=, ==, and !=) have the same precedence as in C. The logical operators have the following precedence: and, or, implies.

Breach of Contract

Finding a contract breach calls dbc_error(). This function has the following prototype:

void dbc_error(const char *message);

message is simply a string with information about the condition that failed. dbc_error() should log this information--or print it to standard error--and not return. It is very important that dbc_error() not return. In my projects I usually define dbc_error() as:

#ifdef HAVE_RUBY_H
  #include <ruby.h>
  #define dbc_error(msg) rb_raise(rb_eFatal, msg)
#else
  #include <stdio.h>
  #include <stdlib.h>
  #define dbc_error(msg) do { (void)fputs(msg, stderr); abort(); } while (0)
#endif

It is generally a good idea to stick to the fail-hard principle [3] and call abort(). (In Ruby, raising a fatal error is similar to calling abort(), but it also gives you a Ruby stack trace.)

Wrap-Up

DBC for C has come a long way since I decided it would be a worthy hack last spring. The project that originally brought about the need for DBC for C has benefited greatly. The benefits include:

  • Improved design--It is difficult to write contracts for code with poorly defined responsibilities. Using Design by Contract encourages code with well-defined responsibilities, which leads to better code and more modular design [3].
  • Reduced debugging time--Well-defined contracts coupled with unit testing help find bugs early, and the vast majority of the time they find them at the source. (Remember, any breach of contract is a bug in the software.)
  • Up-to-date documentation--Contracts make great documentation. It doesn't stop there, though. Because of testing contracts at runtime, out-of-date documentation is a bug in the software.

Hopefully you're now interested in using contracts in your C code. Here are some features of DBC for C that I will explore in the future:

  • Integrating into your build process.
  • Enabling different levels of contract checking.
  • Unit testing. (Unfortunately I don't have room to discuss this here, but contracts are just fancy documentation unless you unit-test.)
  • Generating Doxygen documentation from contracts and adding contracts to your existing Doxygen documentation.
  • Accessing the value of a variable before function execution in postconditions using the @pre syntax.

In preparation for this article I spent some long nights preparing DBC for C for public consumption; hopefully you're ready to give it a try.

References

Resources

  • DBC for C: generates Design by Contract runtime checks and documentation from tags embedded in C comments.
  • Doxygen: documentation system for C, C++, IDL, Java, Objective-C, and other languages.
  • Racc (Ruby yACC): an LALR parser generator for Ruby.
  • Ruby: the powerful and dynamic object-oriented language that "makes programming fun again."
  • Ruby Installer: a one-click Ruby installer for Windows. A Mac OS X installer is under development.

Charlie Mills develops data warehousing applications using C and Ruby.


Return to ONLamp.com



Sponsored by: