ONLamp.com
oreilly.comSafari Books Online.Conferences.

advertisement


Managing Many-to-Many Relationships with PL/pgSQL

by David Wheeler
06/29/2006

A common pattern when managing the relationship between object-oriented applications and databases is the many-to-many relationship. Object-relational mappers usually manage these relationships as collections of objects, wherein one class has an accessor that returns a collection of related objects. For example, imagine that you're creating (yet another) blogging application. You want to associate your blog entries with tags. Tags can be used over and over again for different blog entries, and each blog entry can, of course, have more than one tag. In this scenario, the blog entry class might have a method that returns a collection of tag objects. (Whether the tag class has a method that returns a collection of blog entry objects is an implementation-specific issue that I'll just leave up to you.)

As an added twist, you might want an ordered relationship. That is to say, for a given blog entry, the order of the tags might be meaningful. Perhaps the first tag is the most important, and the last tag is the least important. The collection of tags is then an ordered set. To manage this relationship in a database, you typically create a join table to manage the relationship:

-- Create a table for blog entries.
CREATE TABLE entry (
  id      SERIAL PRIMARY KEY,
  title   TEXT,
  content TEXT
);

-- Create a table for tags.
CREATE TABLE tag (
  id   SERIAL PRIMARY KEY,
  name TEXT
);

-- Create a join table to collect tags for entries.
CREATE TABLE entry_coll_tag (
  entry_id  INTEGER REFERENCES entry(id)
                    ON UPDATE CASCADE
                    ON DELETE CASCADE,
  tag_id    INTEGER REFERENCES tag(id)
                    ON UPDATE CASCADE
                    ON DELETE CASCADE,
  tag_order SMALLINT,
  PRIMARY KEY (entry_id, tag_id)
);

-- Keep things orderly.
CREATE UNIQUE INDEX idx_entry_coll_tag_order
ON entry_coll_tag (entry_id, tag_order);

So far so good. Blog entries live in the entry table and tags in the tag table, and the entry_coll_tag table manages the one-to-many relationship. Note that in the entry_coll_tag table, the combination of entry_id and tag_id forms the primary key, as each entry can only be associated with a given tag only once, and vice versa. There is a unique index on entry_id and tag_order, so that a given blog entry has its tag associations ordered, but each order number can appear only once per blog entry. (Otherwise they wouldn't really be ordered, would they?)

To select all the tags in the collection for a given blog entry, execute a query like:

SELECT tag.id, tag.name
FROM   tag, entry_coll_tag
WHERE  tag.id = entry_coll_tag.tag_id
       AND entry_coll_tag.entry_id = ?
ORDER BY entry_coll_tag.tag_order;

To insert the data for a new blog entry, you might write in the entry class something like this pseudo-code (error handling elided for clarity):

# Use prepared statements.
insert = dbh.prepare('INSERT INTO entry (title, content) VALUES (?, ?)');
sel_id = dbh.prepare("SELECT CURRVAL('entry_id_seq')");

ins_coll = dbh.prepare('
    INSERT INTO entry_coll_tag (entry_id, tag_id, tag_order)
    VALUES (?, ?, ?)
');

# Do everything inside a transaction.
dbh.begin;

# Insert the new entry.
insert.execute(entry.title, entry.content);
sel_id.execute;
entry.id = sel_id.fetch;

# Associate the tags with the entry.
i = 0;
foreach tag in (tag_array) {
    ins_coll.execute(entry.id, tag.id, ++i);
}

# Make it so!
dbh.commit;

To reorder the associated tags for an existing blog entry, just execute the foreach loop with the new order. If you want to add new tags to an existing collection, it gets a bit more complicated:

# Transactions make us safer!
dbh.begin;

# Get the current highest value of tag_order.
max_ord = dbh.selectcol(
    'SELECT COALESCE(MAX(tag_order), 0) FROM entry_coll_tag WHERE entry_id = ?',
    entry.id
);

# Add the new tags to the entry.
i = 0;
foreach tag in (new_tag_array) {
    ins_coll.execute(entry.id, tag.id, max_ord + ++i);
}

dbh.commit;

Indeed, this is the approach taken by most object-relational mapping applications to manage ordered many-to-many relationships. However, there's a major problem with this approach: there is a race condition. What happens if two different processes are updating the collection of tags on the same blog entry at the same time? They might both grab the same value for the previous maximum value of the tag_order column, but one commit would fail. While it's good and proper to maintain the integrity of the data in this way, it's not a great idea to expose customers to an unexpected error screen indicating a constraint violation.

What's more, this code is slow. It makes a lot of calls to the database here. What if there were 100 tags associated with a blog entry? That's 100 calls for each insert into the join table. It also lengthens the duration of the transaction, increasing the likelihood of an error due to the race condition.

Wouldn't it be a lot cleaner if there were a way to tell the database to manage the collection associations? What if there were functions to call that eliminated the race condition and updated the collection relationships in a single database call?

Fortunately for PostgreSQL users, there is. You can move the collection relationship maintenance code out of the application layer and into database functions. In "Writing PostgreSQL Functions with PL/pgSQL," I introduced PL/pgSQL with some impractical examples. Now I want to get practical by using PL/pgSQL to solve this object-relational mapping problem.

Pages: 1, 2, 3

Next Pagearrow





Sponsored by: