ONLamp.com    
 Published on ONLamp.com (http://www.onlamp.com/)
 See this if you're having trouble printing code examples


Developing RESTful Web Services in Perl

by Andrew Sterling Hanenkamp
02/19/2008

If you are a web developer and aren't familiar with the term "REST," you should know that you regularly work within this software architecture. REST is a term that can describe the interactions that occur between client and server on the World Wide Web. However, as with many terms, its original definition actually takes on further meaning as different parties have used it. In this paper, I assume REST to be more specifically applied to describe communication between a client and server as part of a specific API.

I am presenting a guide to you here on how to get started developing your own RESTful API. This guide has been split into two parts. In this part, I provide a very brief description of REST as it applies to this guide. I will then describe how to build your very own RESTful server using a CGI script (which should be easily extrapolated into FastCGI or mod_perl or another web framework). In the next article, we'll go over how to access this server from a RESTful client written using libwww-perl. I will end with a couple useful extensions that should be considered, and I will share some other important resources.

What is REST?

In the most generic terms, REST (Representational State Transfer) is a software architecture originally published by Roy Fielding in his dissertation. More specifically, this term has been used to define web service APIs for the management of resources that may be created, read, updated, and deleted (CRUD) over HTTP. This is the focus of this article. RESTful APIs are used by many big names. Off the top of my head, I know that Amazon (Amazon Web Services), Intuit (Quickbase), and Facebook all provide RESTful interfaces to their applications. There are many others.

We will spend a lot of time discussing resources in this article. A resource in a RESTful web service is just some unit of data useful to your site. This is probably a record in your SQL database, but it could be an account on an LDAP server, a segment of an XML data file, or just about any other unit of data you want to share with others. The sample server, for example, will be reading from and writing to files on the disk.

RESTful Principles

Before I talk about some principles, I should state the disclaimer that I am coming at REST from a completely practical background. REST is a software architectural style and, as such, it has a lot of papers and theory and purists attached to it. This guide is about getting stuff done in Perl. If I fail to present a pure message on REST itself, I will only excuse myself by saying that I never claimed to do so. See the resources in the RESTful Resources section if you want more information on REST as an architecture.

With that out of the way, let's talk about some principals.  You won't get fair into literature on REST without running into the "REST Triangle" (see Figure 1 below). Essentially, there are three key concepts to REST: nouns, verbs, and content types. I will discuss each briefly here.

restful triangle
Figure 1. The REST Triangle

Nouns: Know Your URIs

A noun is an identifier for a resource. This is generally a URL (link to GET the resource) when we talk about REST in HTTP. It might also be a URN (a name for a resource that can be used via HTTP or something else to identify the resource) or another kind of URI (URLs and URNs are often URIs when used as REST nouns). You probably want nouns that uniquely identify (a URI: Unique Resource Identifier) your resource or at least one noun that uniquely identifies the resource, but you might provide nouns that are not unique. For example, I might have an interface identifying the same record with the following nouns:

Each of these might be URLs in my interface. The first might be the unique ID of my account on the system, a unique ID. The next example uses my last name, which is pretty unique in the United States, but not totally so and certainly not unique worldwide. The last example is an example of a noun that is unique according to an external authority. The choice of how you identify your nouns is something to consider.

For a more detailed treatment of nouns and URLs, you may want to read more about URL Construction.

Verbs: Know Your CRUD

CRUD is an acronym referring to the common changes made to data: Create, Read, Update, and Delete. This set of operations generally encompasses everything that can be done to a piece of data. When we talk about these operations within the context of REST, we will use specific HTTP request methods to implement each. In REST nomenclature, these are called the verbs of the architecture.

GET
A GET request is used to perform a read operation. This will be used to return the content of your resource.
POST
A POST request is used to perform a create operation. A POST will create the resource on the server and assign a noun to it.
PUT
A PUT request is used to perform an update operation. A PUT operation performs the opposite of GET: it updates the resource on the server when the client pushes content to the server.
DELETE
A DELETE request is used to perform a delete operation (whoa, deep). That's it.

In theory, you don't really need all of these in every RESTful web service. If you don't allow modification of resources, you can just use GET. A simple REST interface might provide only GET and PUT or GET and POST, depending on your needs. These are not the only available verbs either.

For more information, consider reading about HTTP Methods.

Content Types: Know Your MIME

The final piece of the triangle is the content type of your resources. The content types provide the format for the data that will take part in your RESTful discussion. You will specify these with the "Content-Type" header in the requests (client-side) and responses (server-side). When traveling on the information superhighway of the World Wide Web, you are pretty constrained to using some variant of HTML as the main document type. In a web service, however, you can use whatever suits your application.

The format you want to use will depend explicitly upon the needs of your application. If you are exchanging organized data, like the sample server and client included with this article, you will probably want a data interchange format like XML, YAML, JSON, or CSV. If your application deals with documents, you will probably want to use a document format related to that, such as HTML, DocBook, SGML, ODF, PDF, PostScript, etc. Your application might manipulate photos (JPG, PNG, BMP) or calendar information (iCal) or categorized links (OPML) or whatever else. You can use microformats or whatever you happen to like.

If you want to be really cool, you can even permit the data to be described in multiple formats. For example, you might allow updates to your data to come as XML, YAML, and JSON by examining the "Content-Type" header sent in the request and treating the given data accordingly. You can allow the client to request that data back in a custom format by examining the "Accept" header and choosing a format based on the client's preference. Ultimately, if your data can be requested and posted in formats that are convenient to your clients, you will probably have happier clients.

See Which Content Type for more discussion considering content types.

RESTful Server

I have written a very simple RESTful web service using a CGI script. Now that we've gotten the theory out of the way, I'm going to walk through how this server works to help explain the concepts in practical terms. This server manages the books in my library. Information about each book is stored as a YAML file in a certain folder. I've avoided using a database to store the information in this guide because I don't want to worry about serializing and unserializing the data. I want to focus on the REST protocol itself as much as possible.

I have based the interface of this REST server upon the work being undertaken on the Jifty web application framework. I believe they have had some good ideas with respect to RESTful implementation. If you are familiar with Jifty, some of the code and policy decisions I've made will look familiar.

Sample Server URLs

Before diving into the implementation, I want to make a note of the URLs I have chosen for it. These URLs are meant to be easy to comprehend in pieces and extensible. These are very similar to the URLs chosen for the Jifty REST plugin.

First, I've chosen to make all the REST interface URLs start with "/=". This may seem a little odd on first glance. However, it provides a very simple way to set your REST URLs apart from the rest of your site. I think its a nice idiom for "IM IN YR API!"

Second, I've made the next component of the API "/model". This borrows from MVC the word "Model." The reason I do this is because one might extend this REST API to include additional features like "/action/" to execute remote procedures or "/search/" to execute a search for data, etc. Those aren't necessarily RESTful, but certainly useful.

Third, I've made the next component of the API "/book" to specify the name of the kind of data we're working with. Again, a future extension might foresee enhancements that add additional models for storage. I might store author biographies in "/author" or information about friends I've loaned books to under "/loan".

These are policy decisions that you should think about ahead of time to allow your API to be flexible with future enhancements without breaking things already made if you can avoid it. These were policies chosen by the Jifty developers for these and similar reasons.

GET to Document

I have chosen to provide some documentation within the API itself. If you install the library.cgi script into your local cgi-bin directory and go to the top-level URL in your browser, http://localhost/cgi-bin/library.cgi/= (or something like that depending on where you installed it), you will get an HTML response documenting how the interface works.

Self-documenting services are, in my opinion, a good idea. If I were building this server for production use and my project time line allowed time for it, I would want to add further documentation to the various error messages that occur to further document how to use the interface. By doing so, you can make recommendations to the developer (or the end-user that got to the wrong place) regarding how to fix the problem.

GET to List

The first real aspect of the API we'll cover is the one that is most fundamental at the get go: listing. This isn't really an aspect of CRUD we discussed above, but if you don't know what resources are available, it might be difficult to fetch them or update them.

The URL for accessing this list of resources is "/=/model/book/id" and it lists all the IDs for the book model.

The code in the sample server is pretty simple. It looks for all the available resources, which are stored as YAML files on the disk. It then outputs an HTML file containing links to the resources found:


print $q->header('text/html');

# Find all the files available
my @items;
for my $filename (glob get_local_path('*')) {
    my ($id) = $filename =~ m{([\d-]+)\.yaml$};
    next unless defined $id;

    push @items, $q->li(
        $q->a({ href => absolute_url('/=/model/book/id/'.$id) }, $id),
    );
}

# List the items
print $q->ul( @items );

The ID of the book (generally the ISBN as we'll see later) is in the filename, as well as stored within the file for reference. The Perl code above outputs an unordered list in HTML of links to the books in my library.

You can try this one out in your browser directly. The URL will be something like this:

http://localhost/cgi-bin/library.cgi/=/model/book/id

If you have an empty resource library (i.e., you just installed it and haven't used the client to add any books), the page will be empty. If you have one or more books stored, you will see bullets with linked IDs.

If you click on one of the links, you will access the book's YAML description.

GET to Read

By visiting a URL like "/=/model/book/id/<ID>" on the server, where ID is a string of numbers and dashes, you will fetch the YAML file describing the book with the given ID. If no book can be found with that ID you will be handed a "404 Not Found" error.

Here's the important code from the sample server. In this snippet, the $id variable is already set to an untainted ID value pulled out of the URL.


# Look up the resource file
my $filename = get_local_path($id);
if (-f $filename) {

    # Open and slurp up the file and output the resource
    open my $bookfh, $filename
    or barf 500, "I Am Broke", "Cannot open $filename: $!";

    print $q->header('text/yaml');
    print do { local $/; <$bookfh> };
}

# No such resource exists
else {
    barf 404, "Where is What?", "Book for $id does not exist.";
}

First, we look up the filename on the local disk. The code checks to see if the file exists and returns a 404 if not. Otherwise, we slurp the file and send back a "text/yaml" response containing the YAML data. Since I'm storing the data on the disk in the format I'm using to communicate, I can just send the file directly. Had this been a database record look or something, I may have needed to serialize the data into YAML format using YAML::Dump().

If you have set up the CGI script, you can attempt to fetch one of these records directly after they have been added. For example, you might visit this URL into your browser:

http://localhost/cgi-bin/library.cgi/=/model/book/id/0-936083-11-5

You will be asked to save oo open the downloaded file.

POST to Create

Now that we've learned to fetch records (which we haven't learned how to create yet), let's learn how to put new books into the system. This is implemented using POST. In this implementation, the server will always assign an ID to a newly created book record, but that ID will be based upon the ISBN recorded in the file (if the "isbn" field is present). If you attempt to create a book resource with an ISBN that has already been submitted, you will receive an error.

Here's the code that will be run if you POST a request to "/=/model/book". I've broken it up a bit to make the explanation a little easier.


# Check to make sure the input book is sane
my $book = check_book( $q->param('POSTDATA') );

# If we have an ISBN (some books don't!), then die if we already have
# it because we don't permit POST cannot for updates!
if ($book->{isbn} and -f get_local_path($book->{isbn})) {
    barf 500, 'Not Gonna Do It',
        'A POST may not be used to update an existing book.';
}

# Our data is sane!

The first thing we do is validate the sanity of the data. The check_book() function performs several tasks to verify the sanity of the request. You can look at the code yourself to see the details, but I will mention that this function will cause a "415 Unsupported Media Type" or "400 Bad Request" or "500 Internal Server Error" depending on the kind of problem encountered. I have tried to pick the code that is most appropriate status code when possible. I highly recommend being familiar with the Status Code Definitions of the HTTP/1.1 protocol to make sure you are sending the best code possible for a given response. And when in doubt, you return a "500 Internal Server Error" to the client, which is the most general error status you can return.


# Figure out an ID, this is either the ISBN or a generated ID
my $id = $book->{isbn} ? $book->{isbn} : next_id;

# Store the ID for reference within the record
$book->{id} = $id;

# Save the resource
eval { YAML::DumpFile(get_local_path($id), $book) };
barf 500, 'I Am Broke', $@ if $@;

Now that we know that the data is sane, we will set up the ID of the book and save the book as a YAML file. If the book has an ISBN, that is used as the ID. If the book does not have an ISBN listed, then a new ID is generated with the next_id() function. The server then uses YAML::DumpFile() to save the book to the disk in YAML format at the appropriate local path.


# Note the success to the end-user
my $resource_url = absolute_url('/=/model/book/id/'.$id);
print $q->header(
    -status   => 201,
    -type     => 'text/html',
    -location => $resource_url,
);
print $q->h1("Created $book->Developing RESTful Web Services in Perl");
print $q->ul(
    $q->li(
        $q->a({ href => $resource_url }, $resource_url)
    )
);

Finally, we end by returning a "201 Created" response to the client. You should try to return either a 201 or 202 response to the client when creating a resource. These are generally superior to 200 or 204 on create. With a 201, you may also set the "Location" header (as I have here) in the response. This must point to the URL of the new item. If you cannot create the item immediately, you may want to consider a "202 Accepted" status. Check the HTTP/1.1 specification for more information.

By the way, don't respond with a 200 status and include a "Location" header under Apache. Apache will translate that 200 into a 302, which is probably helpful for the typical forgetful web programmer, but is not what you want in a RESTful interface. A "Location" header is not appropriate in a 200, so make sure you return a 201 when including a "Location" header for creates.

PUT to Update

In a RESTful interface, PUT is the exact opposite of GET. GET fetches the content from the server. PUT pushes the content to the server. In the sample server snippets below, the variable named $id is already set to the last part of the URL, which looks like "/=/model/book/id/<ID>". Here is how the sample server starts handling this PUT.


# Check to make sure the input book is sane
my $book = check_book( $q->param('PUTDATA') );

# Make sure the book already exists or barf
my $resource_path = get_local_path($id);
unless (-f $resource_path) {
    barf 500, 'Not Gonna Do It',
        'Cannot use PUTs for creating a new resource.';
}

This looks similar to POST. I first start by performing a sanity check to make sure the data is sane. In addition to using check_book() again, I also perform an extra test here to make sure this is an update. This REST server does not permit PUT to create records. Your own REST API could allow creates with PUT, but this should only be permitted when the client is able to and has specified a proper URL for the PUT.


# Make sure the ID is set
$book->{id} = $id;

# Save the resource
eval { YAML::DumpFile($resource_path, $book) };
barf 500, 'I Am Broke', $@ if $@;

This should look similar to the POST, but it's a little similar since the ID is known from the URL (and won't change). This is one place where this interface is a little wonky. You could actually change the ISBN here so that it differs from the ID, which is probably bad. Meh. I'm just a lazy author, so I leave fixing this as an exercise to the diligent reader.


# Note the success to the end-user
print $q->header('text/html');
print $q->h1("Updated $book->Developing RESTful Web Services in Perl");

Finally, I return the response back to the client to let them know it succeeded.

DELETE to Delete

The last verb we consider in the sample server is the DELETE. We are using the same URL that we already used in GET and PUT: "/=/model/book/id/<ID>". In the code snippet here, $id is already initialized to <ID>.


# Make sure the book actually exists
my $resource_path = get_local_path($id);
unless (-f $resource_path) {
    barf 404, 'Where is What?',
        'Nothing here to delete.';
}

# Baleted!
unlink $resource_path;

# Tell me about it.
print $q->header('text/html');
print $q->h1("Deleted $id");

This is pretty simple, but let's walk through it. First, we find where this ID should be found on the local disk. We check first to see if there is even a file there to delete. If not, return a 404 error. If it does exist, delete it. Finally, return a nice 200 response and content letting them know about the update.

That concludes the discussion of the features of the sample REST server. Next, we can discuss some significant extension possibilities.

Stay Tuned for the RESTful Client

We've now explained most of the key components of the sample REST server. Hopefully I have given you some principles and practical tools in Perl to explain the most important concepts when developing your own RESTful web service.

In the next article, I will show how to access all these features with a custom made client tool written in libwww-perl. We'll also consider some extensions that can be made to client and server to improve the abilities of the service and I will leave you with some links that have been helpful in my own work with REST web services.

Until then, cheers.

Resources

Andrew Sterling Hanenkamp is a proud Kansan and spends most of his time hacking Perl, his web site, avoiding yard work, and with his wife and son.


Return to ONLamp.

Copyright © 2009 O'Reilly Media, Inc.