ONLamp.com    
 Published on ONLamp.com (http://www.onlamp.com/)
 See this if you're having trouble printing code examples


Building a Mailing List

by Glenn Fleishman
07/01/2004

After programming for about seven hours over a weekend in January, I turned to my wife and said, "I just saved us $10,000 this year." My wife is used to such pronouncements. My income as a freelance writer and entrepreneur is lumpy and comes in strange forms. Without batting an eyelash, she said, "What is it this time?"

Those seven hours of programming, supplemented over the last four months with five other hours, allowed me to develop a rich mailing list system that replaced an outsourced service charging $850 per month. Needless to say, we celebrated: $10,000 in savings doesn't come every day.

I started my first mailing list in 1991, when the Internet was largely an academic province, and I had to rely on the information technology department of the university where I worked to handle the LISTSERV software. The list didn't last long, but it grew to over 400 subscribers worldwide talking about how to solve early desktop publishing problems.

In the intervening 13 years, I've run many lists — almost all of them moderated — and the same issues of list administration that I faced in my first list have recurred each time.

These three categories of user email consumed from two to five hours of my time in the mid-1990s for a list that had 7,000 subscribers on it and much less time for more recent lists managed via Yahoo Groups and other services that automated most bounce processing.

Add to these old-time problems the newer and increasing issues of spam, both incoming and outgoing. Incoming spam can end up on your list if you don't have a mechanism to prevent it; outgoing messages classified as spam can halt your ability to send any mail at all from your mail server.

Last year, I decided to start a subscription, announcement-only list for the posts I and a colleague make to my commercial web log, Wi-Fi Networking News, which runs on Movable Type. I checked the free and paid list services and finally settled on a company that resold Lyris. They would allow me to send out ad-free individual messages and digests and have most of the control and reliability that I needed.

The drawback was that previously mentioned $850 per month. With the ramp-up in subscribers I predicted and an advertising deal already in place that will eventually extend to the mailing list, I suspected this cost would be minor compared to the revenue per subscriber.

Making minor mistakes with a mailing list's behavior can result in your mail server becoming blacklisted, and I thought that an arm's length approach would reduce the chance of my servers running afoul of others. I also didn't want to deal with the Angry Subscriber, Bouncy Bouncy, or the Joker, and in examining the offering I signed up for I felt confident that they would absorb most of the time I'd wasted in running previous lists.

Ultimately, even with the excellent support and services they offered, my commercial host didn't have the flexibility I wanted. For instance, I wanted to have a different template for the digest version of the list and the individual email version, which was impossible. I also wanted to have the default subscription be in digest mode, which my experience shows the majority of users want. Lyris' current version is incapable of offering both double opt-in and a digest default option, making that intractable and irritating users who couldn't follow the precise email-oriented instructions to switch from individual message to digest.

I looked at the code I'd written to extract Movable Type posts and send them to the Lyris-based server and realized that with a little work, I could combine my existing knowledge of sendmail, Apache, and Perl with the MySQL experience I'd gained since last summer when I first started fully understanding how to use it.

My main concern was, of course, Angry Bouncy Jokers, and whether they would drag me down. It turns out, I didn't need to worry about this at all.

Conceptualizing the Subscription System

I've run moderated and announcement lists for so long that I know what friction a good subscription process can remove:

The subscription interface I built conforms to these guidelines and has resulted in a single, manual unsubscribe request across 1,800 subscriptions and 200 unsubscribes — and that was due to a bug that failed to set an unsubscribe flag properly. No flames, no manual intervention, no complaints.

Planning Distribution

Related Reading

MySQL Cookbook
By Paul DuBois

The goal of this process is to make it possible for people to choose to receive individual posts made to the web log as email messages. They can choose to receive each post as it's made as a separate email message or, more commonly, as a digest containing all posts sent each night.

Because I use Movable Type to manage my system, I could have used their API to develop a plugin that would activate on new posts to trigger distribution. I'm dubious about having to maintain that plugin, though, so I chose a very low-tech high-tech solution: a crontab entry.

Every five minutes, a cron job queries the MySQL database to find posts made in the previous few minutes. If it finds a new post, it queries the list database for individual message subscribers and sends them each a unique email with a tracking code. (On my MySQL box, that query requires well under 0.001 seconds to run.) Each night before midnight, a cron job queries for all posts sent out in the previous day, generates a header containing the list of posts titles, concentrates, and distributes.

The database schema has three tables. One table contains the email subscriber information, with flags and fields for initial subscription, confirmation, suspension, and unsubscribe status, among others. A second table contains an entry for each Movable Type post sent out and the timestamp for when it was distributed. A third table records bounces, indexing them by the unique table ID for the subscriber table. Here are my schemas:

CREATE TABLE `bounces` (
  `bounce_id`    int(11)  NOT NULL auto_increment,
  `bouncer`      int(11)  NOT NULL default '0',
  `bounce_count` int(11)  NOT NULL default '0',
  `bouncetime`   datetime NOT NULL default '0000-00-00 00:00:00',

  UNIQUE KEY `bounce_id` (`bounce_id`),
  UNIQUE KEY `bouncer`   (`bouncer`),

  KEY `bouncer_2`    (`bouncer`),
  KEY `bouncetime`   (`bouncetime`),
  KEY `bounce_count` (`bounce_count`)
) TYPE=MyISAM DEFAULT CHARSET=latin1;

CREATE TABLE `posts` (
  `post_id`   int(11)       NOT NULL auto_increment,
  `mt_id`     int(11)                default NULL,
  `post_time` datetime               default NULL,
  `sent`      enum('y','n') NOT NULL default 'n',

  UNIQUE KEY `post_id` (`post_id`),

  KEY `mt_id`     (`mt_id`),
  KEY `post_time` (`post_time`)
) TYPE=MyISAM DEFAULT CHARSET=latin1;

CREATE TABLE `wifisub` (
  `wifi_id`        int(11)               NOT NULL auto_increment,
  `sub_name`       varchar(254)          NOT NULL default '',
  `sub_email`      varchar(254)          NOT NULL default '',
  `subdate`        datetime              NOT NULL default '0000-00-00 00:00:00',
  `confirm_status` enum('y','n','w','u') NOT NULL default 'n',
  `confirm_date`   datetime              NOT NULL default '0000-00-00 00:00:00',
  `confirm_code`   varchar(15)           NOT NULL default '',
  `sub_type`       enum('d','e')         NOT NULL default 'd',
  `suspend`        enum('y','n')         NOT NULL default 'n',
  `unsub`          enum('y','n')         NOT NULL default 'n',
  `unsubdate`      datetime              NOT NULL default '0000-00-00 00:00:00',

  UNIQUE KEY `wifi_id` (`wifi_id`),

  KEY `wifi_name`    (`sub_name`),
  KEY `wifi_email`   (`sub_email`),
  KEY `confirm_code` (`confirm_code`),
  KEY `confirm_date` (`confirm_date`),
  KEY `unsubdate`    (`unsubdate`),
  KEY `subdate`      (`subdate`)
) TYPE=MyISAM DEFAULT CHARSET=latin1;

When the code creates a new subscriber, it assigns her a unique, random code. This code isn't so much as a password as a tracking element. Each email sent to the user contains the code as part of the return address and error address. Because the code is used identically each time, the user can add this address to a white list or mark it as a good sender in a challenge/response system.

The reason for embedding the code is to take advantage of sendmail's ability to ignore everything in an address starting at a plus sign. An address such as wifibounce+KJFLKDFDFK@wifinetnews.com looks like wifibounce@wifinetnews.com to sendmail. Using the aliases and virtuserstable files, I redirect incoming email to the subscribe, bounce, and unsubscribe addresses to a program that parses the code and chooses the appropriate behavior to perform.

An important side note: Most sendmail installations are coupled with smrsh, which limits which programs sendmail can execute through pipe from aliases or .forward. The program you want to use needs a symbolic link in /usr/adm/sm.bin/ or a similar path. Check the documentation with your distribution of sendmail. Often, sendmail is looking for a non-existent directory hard-coded into the program. You can change this via sendmail.cf or by recompiling with flags set to any other directory you wish.

Incoming mail to one of several addresses ends up in a program that parses the contents and then hands it off via curl to a Perl CGI script. Why use curl? The handoff scripts handle the optional Web part of the subscribe and unsubscribe operation and accessing them via the web interface eliminates a step and doesn't require using SOAP or XML-RPC libraries.

The sendmail-passthrough script has to handle a bit of parsing for bounced messages, which was and continues to be the hardest part of list software. I'll offer more insight into this a little later in a section devoted to bounces.

The four CGI scripts use a Perl DBI interface to talk to the MySQL database and update bounce counts, set suspend flags, and to turn subscription status from pending to confirmed or subscribed to unsubscribed. I store all of the email sent to these addresses in full — except for occasional spam purges — in case of a later complaint or lawsuit so that I can produce precise email with headers to show the opt-in or confirmation.

List Interaction

Users can sign up for the list through any page on WiFiNetNews.com using a small form in which they need to provide as little information as their email addresses. The form is preset to digest mode, but users can click the radio button to switch to individual messages. They can also choose to enter their first and/or last names. The current software stores this information but does not use it.

A user can also subscribe by sending email to either wifi-sub@wifinetnews.com (digest) or wifi-sub-msg@wifinetnews.com (individual posts). That information isn't as widely disseminated, but it does appear on a page on the site that explains in more detail how to subscribe.

Both the web page form and the wifi-sub addresses trigger the same script, which confirms that the address isn't already in the database, then inserts a new record with a number of defaults, including setting a flag to indicate that the subscription is pending.

The script now sends out the second opt-in request via email using the Perl Mail::Sendmail module. If that request bounces, a suspend flag is set for that email. The system retains the record to forestall the potential of someone making a second attempt. (Someone has, of course, tried to subscribe bgates@microsoft.com.) A message will stay pending forever if no one ever responds to it.

The body of the message contains a message that alerts the recipient either to respond to the email with any or no contents or to follow a web link. Either action triggers a confirmation script that updates the pending flag to confirmed. The next time an appropriate action comes along — a new post for individual messages or a new digest for daily subscribers — that subscriber receives the message.

Every message sent to the list has a lengthy post-amble where CODEHERE is the unique code assigned to a user on subscription:

--
You are subscribed as foo@example.com
You can unsubscribe at any time by sending email to
<mailto:wifiunsub+CODEHERE@wifinetnews.com>
or visiting <http://wifinetnews.com/cgi-bin/listunsub?confirm=CODEHERE>
To change from individual messages to the daily digest, just
send email to <mailto:wifi-sub@wifinetnews.com>
You can reach Wi-Fi Networking News at <news@wifinetnews.com>
Our postal address: 115 N. Any St., Suite 1000, Seattle WA 98999

It's long, but it's also prevented the Angry Subscriber responses described early in this article. All unsubscribers have managed to figure out how to use this multi-tiered approach. Both the unsubscribe methods use the same script: the script sets the unsubscribe flag and stores a date stamp for that action. The address remains in the system as a historical record and for statistics tracking.

I've included the postal address (obscured here) in each message to demonstrate a good faith effort to comply with the U.S. CAN-SPAM legislation that requires a postal address in advertising messages in certain circumstances. Even though I'm sending out editorial that meets the highest tests in the law, I still include it just to keep readers happy. I'm not worried about someone stopping by or sending nasty-grams via snail mail.

Processing Bounces

If you examine the stream of replies that seemingly identical mail servers send back when they bounce mail, you laugh and cry. Because the Internet is an anarcho-syndicalist union in which we each agree to have our own rules in our little domains as long as when we meet, we don't interfere, mailers produce a host of bizarre, funny, and misleading responses. There's some standardization, but the exceptions are quite broad.

If you look at the code in wifibounce, you'll see my best analysis and attempts to deal with bounces. This seems to cover now about 99.5 percent of all bounces to the list. I have to examine the bounce raw logs once or twice a month for exceptions.

Not processing bounces can be one of the biggest failures of a mailing list system. Among other problems, repeated mailings to non-existent users can cause some administrators or systems to mark you as a spammer and blacklist you. It also keeps a higher bandwidth and system load running when you mail out messages — and it's not neat. For larger lists with tens of thousands of subscribers, you could generate 100,000 bounce messages a day without proper trimming.

My assumption is that it's better to suspend an account aggressively rather than to send out unwanted mail indefinitely. For certain kinds of bounces that include complete failures (user unknown or blacklisted, for example), the script suspends a user's account immediately. There's a separate suspend flag so that we can occasionally restore suspended accounts to test whether they're working again.

Accounts that generate transient failures aren't suspended until the system processes five in a row within a day. Unfortunately, my list typically sends out several messages a day, so a transient problem can result in an unnecessary suspension. That's a longer-term issue to solve.

Occasionally, I reset all of the suspend flags from no to yes and let the system re-calibrate against which accounts were transiently dead and which were permanently. One improvement in my system would handle suspension more gracefully, using a cron job to restore only soft bounced accounts every few days.

The Results

Since setting up this system in January, I've saved several thousand dollars and, with very few additional inputs, I've developed something I'm secure enough about that I'm ready to share it under a Creative Commons license. I don't have any interest in making a buck off this system, but would hope that Movable Type users will develop the MT hooks better than I can while perfecting some of the more obvious bounce situations. I'm not an efficient coder, and I know that a motivated Perl hacker could trim the scripts down enormously. It could also broaden to work with Postfix, Qmail, and other MTAs, or to use PHP on the backend instead of Perl.

The list statistics speak for themselves. With several thousand daily unique visitors to the web site, the list currently has nearly 1,400 subscribers, a good ratio of visitors to list members. There are just under 1,100 subscribe to the digest and just under 300 to individual messages.

Another cron job produces statistics every morning for me, and I've tracked how well the list has performed on a variety of measures. Fewer than one percent of subscribers unsubscribe each week, while the list grows by three to five percent each week with a few growth spurts when the news site has additional traffic or publicity. Usually, less than one percent of the list is suspended in an average week due to hard bounces or for exceeding the soft bounce count.

As noted earlier, I've had to remove a single person manually due to a bug, since fixed, and she wasn't angry. I never have to add subscribers: they all figure it out for themselves. I've received no complaints from ISPs, spam services, or other engines of protection. I may be restricting the list's growth a tad, but with no complaints it's hard for me to have cavils, either.

Some of you will surely point out that I could have turned to Mailman, Majordomo, or other commercial or free mailing packages and tuned them to my needs. But given the success of converting a little bit of starter code in under 10 hours using LAMP and sendmail into an almost perfectly operating customized list, it's hard to feel too bad about writing my own code and now sharing it.

Example Code for this article is available for download.

Glenn Fleishman is a freelance technology journalist contributing regularly to The New York Times, The Seattle Times, Macworld magazine, and InfoWorld. He maintains a wireless weblog at wifinetnews.com.


Return to ONLamp.com.

Copyright © 2009 O'Reilly Media, Inc.