oreilly.comSafari Books Online.Conferences.


Perl and XML

Skimpy Forum: An Application of Perl and XML

by Erik T. Ray, coauthor of Perl and XML

Every day, thousands, if not millions, of people view and post messages on Web forums. These forums are public, like billboards, but organized like email into threads, and sorted. My friends and I have been using them for years to keep in touch.

Over time, forums have become quite ornate, with polished designs as you would find in slashdot and other places. But I told my friends that at their heart, a forum is really very simple. In fact, I could probably write one over the weekend. "OK,"they said, "let's see you do that."

Oops. I had just committed myself to another project -- and I have more than enough work to do already. With my respectability at stake, I started to work on Skimpy Forum, a Perl- and XML-based CGI application. Here are the things I wanted Skimpy Forum to do:

  • Skimpy Forum would display a list of all threads, with titles for each. This would be the default page.
  • On this page would also be a form for starting new threads.
  • A user would be able to select a thread from this list, which would bring up a new page showing all posts in order of submission.
  • A post would include the name of the author, the time it was submitted, and, of course, its text content.
  • A user would quote a post by clicking on a link next to it. This would bring up a separate page with a form that included the quoted text in the textbox.

What's missing is user authentication. Arguably, this is very important for a forum, because you want to be able to keep out anyone who abuses the rules. However, this being a project between me and a couple of friends, I didn't think it was strictly necessary. I can always add a cookie-based user authentication scheme later. For now, I'll trust that the user is who he says he is.

Now on to the Common Gteway Interface (CGI) design. I used a parameter called action to tell the program which action to perform, whether to display a list of threads, or add a post, or whatever. The actions supported are:

(none)display list of all threads
startcreate a new thread
showshow a thread
postpost a new message to a thread
quotequote a previous post

Other parameters supply action-specific information. For example, the following CGI query string would add a new post to thread number 4 and attribute it to "Bubba:"


So that's how the forum would look from the outside. Now to design the innards. The heart of the program is its data structure. For storage, I decided on XML rather than a database because it's simpler and faster to set up. True, databases are more scaleable and offer faster performance, but they are also more complex and would take too much time to get going. Besides, I have a soft spot for XML and thought it would be fun for this project. One cool thing about XML is that you can view its guts in any text editor, whereas a database has a proprietary interface.

I devised an XML markup language to hold the threads and post information. These were the elements I came up with:

forumroot element
threadcontain all the posts in a thread
thread/titletitle of a thread
postcontain data for a posted message
post/fromname of the contributor
post/datewhen the post was submitted
post/contentcontents of the message

Example 1 shows a data file with one thread containing one post.

Example 1. A sample data file.

<?xml version="1.0"?>
  <thread num="1">
    <title>Excitin' Stuff</title>
    <post id="1">
      <from>Fat Albert</from>
      <date>Sun Jun  2 20:49:49 2002</date>
      <content>hey hey hey</content>

Note the addition of id attributes for posts and num attributes for threads. These serve as unique identifiers that I use to select particular threads and posts. Why not use the id attribute in both threads and posts? In XML, it's traditional to use id as a unique identifier across all elements, regardless of type. That means having a thread with id="1" and a post with id="1" is forbidden. I want to keep the two separate, each with its own counting scheme, so I used different attributes.

I decided not to make a Document Type Description (DTD), a formal description of the language that allows you to do high-level testing of grammar called "validation." Validation wouldn't be necessary because I trusted the source of the file: namely, the program itself. Once it was debugged, I could trust that it wouldn't mess up the structure.

Pages: 1, 2

Next Pagearrow

Sponsored by: