What's New in SpamAssassin 3.0by Alan Schwartz, author of SpamAssassin
Many system administrators rely on SpamAssassin as part of their spam filtering strategy. Its combination of static rules for recognizing spam and its ability to adaptively learn the characteristics of spam and spammers make it appealing in many environments.
During the process of writing my book, SpamAssassin, I began to use the beta and release candidate versions of the soon-to-be-available SpamAssassin 3.0.0 (SA 3), so I could be sure that the book would cover any differences. This major release is anticipated to occur in the next month or two. What new features can administrators expect? We'll take a look at a few in this article.
Naturally, SpamAssassin 3.0.0 includes many new static rules, and changes the definitions and scores of several old ones to reflect the changing nature of spam. For example, many rules focused on pharmaceutical spam are now included--drugs seem to have caught up with mortgages and pornography in the distribution of spam.
The SpamAssassin Perl API has been extensively rewritten; software
that invokes SA from Perl, such as proxies or mail server filters,
will require recoding. The two popular filter applications discussed
in the SpamAssassin book, MIMEDefang and
amavisd-new, have already been
updated to support SA 3. Mailers that integrate with SpamAssassin
by invoking the
spamc program or communicating directly with the
spamd daemon don't need to be rewritten.
More significantly, SA 3 now supports plugins--new modules of code that extend SA's capabilities. SA 3 is distributed with four working plugins:
RelayCountry: This plugin adds the ability to add the country codes of relays through which a message passes to a message header. For those who wish to filter on the basis of country of message origin, this plugin provides the necessary information.
Hashcash is a spam reduction system in which message senders are required to perform processor-intensive computations and include the results with the message. Recipient mail servers can inexpensively verify the computations. Spammers will not be able to generate hashcash stamps in bulk, so a message with a valid hashcash stamp is unlikely to be spam. This plugin adds support for hashcash checking in SpamAssassin; messages with valid stamps receive lower spam scores from SA.
URIDNSBL: Spammers can try to disguise the origin of their spam, but they can't disguise the URLs of the web sites they're promoting. The URIDNSBL plugin checks URLs in message bodies against online blacklists of spammer-operated web sites, such as the SBL, operated by Spamhaus. Advertising a spammer's web site? Pay the price with a higher spam score.
Sender Policy Framework, or SPF, is an emerging standard for domain owners to publish lists of servers that are permitted to originate email for the domain. AOL has adopted SPF and has also encouraged its partners to do so (see postmaster.aol.com/spf). The SPF plugin checks the originating address against published SPF records, if any, and penalizes messages that originate from servers that are not listed as valid when an SPF record has been defined. (And because SPF is controversial among mail administrators, it's easy to avoid it in SA 3 by simply not loading the SPF plugin.)
SQL and LDAP Support
With earlier versions of SpamAssassin, maintaining per-user configuration was difficult in virtual hosting environments when users did not have shell accounts on the mail server. SA 3 greatly eases this difficulty by allowing per-user preferences, Bayesian data, and auto-whitelists to be stored in an SQL database, rather than in files in users' home directories. Example SQL tables are included for both MySQL and PostgreSQL servers. Alternatively, per-user preferences can be stored in an LDAP database, which is a boon to sites that have LDAP-driven mail setups. Available preferences are more extensive, as well; for example, new directives allow email from or to given addresses to opt out of the Bayesian classifier. A pharmacist whose mail is on a host that uses a site-wide Bayesian database can now avoid having their legitimate email classified as spam because SA has learned from other users' mail that "sildenafil" is a spam token.
SpamAssassin has always distinguished trusted networks from untrusted networks, and does not perform DNS-based blacklist testing on relays on trusted networks. SA 3 introduces the new concept of internal networks, which are not only trusted, but assumed to be under your direct control. By separating trusted and internal networks, SA 3 can do a better job at detecting spam originating directly from dialup hosts but still exempt trusted sites from blacklists.
SpamAssassin 3.0.0 is now an Apache Software Foundation project, and is released under the Apache Software License instead of the GNU General Public License or Perl Artistic License. As these licenses all fit the open source and free software definitions, most end users won't notice the difference. If you plan to redistribute SA 3 as part of a larger system, however, you should be aware that the Free Software Foundation does not consider the Apache Software License to be GPL-compatible, so you may run into difficulty if you wish to combine SA 3 with GPL code.
The Upgrade Experience
Although the upgrade process is fairly straightforward, you can expect
a few gotchas with a new major release. Some command-line options
have been deprecated or renamed. The extremely common
rewrite_subject configuration directives have been renamed to
rewrite_header (the latter capable of rewriting more
than just the Subject header). Integration with Mail::Audit is no longer
supported, and older Bayes database files may require special care to
update to the new version.
I've always been happy with the performance of SpamAssassin for my own
mail (I run
postfix and invoke SA through
amavisd-new). With Bayesian
filtering and auto-whitelisting, I see very little spam, and have almost
no false positives. Since I've upgraded to the second release candidate
of SA 3, I've seen even more improvement, particularly thanks to the
URIDNSBL plugin. Mail administrators should watch for the official
release of SpamAssassin 3.0.0 and strongly consider upgrading SA to the
new version or adopting it as a valuable spam-fighting tool in their box.
Alan Schwartz is an associate professor of clinical decision-making in the Department of Medical Education at the University of Illinois at Chicago.
In July 2004, O'Reilly Media, Inc., released SpamAssassin.
Sample Chapter 2, "SpamAssassin Basics," is available free online.
For more information, or to order the book, click here.
Return to ONLamp.com