ONLamp.com    
 Published on ONLamp.com (http://www.onlamp.com/)
 See this if you're having trouble printing code examples


SpamShield: A Perl-Based Spam Filter for sendmail

by Glenn Graham
09/12/2002

Numerous articles have focused on ways to recognize and block spam email. As system administrators work to build sophisticated roadblocks, spammers continue to find ways to knock them down. This article will focus on one viable solution, SpamShield version 1.40 -- a Perl-based spam filter for sendmail. We'll cover how it works and how to install and configure it on your server.

The science of spam (if you can call it that) has taught us one thing: spam leaves a definite "calling card" in the system logs. This calling card is generally repetitive enough that the process of tracking spam may be automated. Based on this theory, a brilliant programmer by the name of Kai Schlichting wrote a Perl-based program called SpamShield.

The Challenge

United States federal and state laws now prohibit the transmission of spam, but these are only laws, and where laws exists, so do criminals intent on breaking them. We must also remember that U.S. laws only apply in the U.S., not in other countries. The Internet is a global playing field, and laws concerning its use are created and enforced on a country-by-country basis. For example, in China, India, and Romania, spamming is legal.

Spam email usually originates from profit-based organizations that purchase "spam lists" from "list sellers" located in various corners of the Internet. Legitimate and illegitimate companies make use of them. I've received email offers promising everything from credit repair to instant university diplomas (I don't require credit repair and I already have my diploma -- thanks). Some of the more legitimate companies include credit card firms, car manufacturers, and drug companies.

All too often, you may see a message at the bottom of the spam that says something like this:

"If you would rather not receive these messages, please click here. It will take up to 48 hours for your request to take effect. All third party products and services promoted on this Site are offered exclusively by third party advertisers. XXX Company makes no representations or warranties with respect to these offers and all claims for injury and damages related to such offers are the sole responsibility of the advertiser."

Talk about integrity.

How Do They Find You?

There are endless ways to seek out email addresses on the Internet. In most cases, list suppliers generate their lists by using computers that scan Web sites and databases filtering the "@" symbol. Next, each address is verified using SMTP verification software, then compiled into a portable database.

Related Reading

sendmail
By Bryan Costales

In other cases, requesting that you be removed from a list (as in the example above) verifies that your address exists; hence, you're added to more lists.

The Open Relay Server

The most common method used to send spam is through an "open" relay host. An open relay is simply an SMTP server that allows any domain to connect on port 25, and relay through to another domain. The engineers at sendmail.org have worked for several years to find ways to reject relaying, using filtering methods such as the access database.

Newer versions of sendmail do a reverse domain lookup before allowing mail to pass. If the incoming domain doesn't exist, sendmail will typically reject the message. This prevents spam from sources that use nonexistent domains in their return header.

How SpamShield Works

The basic principle of SpamShield is fairly straightforward. First it gathers a "chunk" of log information and builds that into a volume of its own. Next, based on a predetermined threshold value, the software decides if the volume contains more than the allowable amount of email originating from any single source (such as "spamdomain.com"). Once the allowable value has exceeded the threshold, SpamShield simply blocks that source from further access.

Through experience, I learned that setting the threshold value is the most important part of making SpamShield run efficiently.

Here is a definition of how SpamShield works, appended from Kai's readme document:

"SpamShield looks at the last <n> lines of the sendmail logfile (maillog), and builds a list of mail volume received from various hosts (by IP) in the period covered by that log fragment. If any particular machine sends more mail than the configured global threshold, the assumption is that spam is received. The IP address is then dropped to a "dead host" (an unused IP address within your netblock). The defaults for the log file fragment and the allowable number of mails per host are for a small system with only a few thousand mails per day. You might want to adjust those limits to avoid false positives. (see set $spamthreshold ). The general assumption is that spam abuse typically means that up to several hundred emails PER MINUTE are received from a single source: this is a tremendous 'signal to noise' ratio, given that even very large systems, such as AOL's mail servers, don't deliver more than a few hundred mails to a small/medium-sized system per day. For this reason, there are configuration options to ignore 'spam-like' traffic from high-traffic hosts that are deemed secure and non-relaying (AOL's servers don't relay, for example)."

Installing the Software

Installation of Kai's software is simple. SpamShield is a Perl script, so you'll need Perl 5, available from Perl.com.

First, download the tarball from Kai's site to your src directory, then untar it. The uncompressed directory structure will look like this:

spamcontrol/ 
spamcontrol/blocked 
spamcontrol/INSTALL-spamshield 
spamcontrol/spamshield.pl 
spamcontrol/dontblock 
spamcontrol/blockignore 

Next, move the spamcontrol directory somewhere more convenient, such as /usr/local/spamcontrol:

 Command: mv ./spamcontrol /usr/local/spamcontrol  

The Perl script, spamshield.pl should be mode 700, and owned by root:wheel:

Command: chmod 700 ./spamshield.pl ; chown root:wheel spamshield.pl 

Please review ./INSTALL-spamshield, located in the root directory, for a detailed installation overview.

Configuring SpamShield

Note this configuration example is BSD-dependent, in that we use /var/log/maillog for all MAILER-DAEMON messages. Other Unix variants use /var/log/messages. This option is configurable within the syslogd.conf file on most systems. For more help with syslog, see Michael Lucas's ONLamp article on syslog configuration.

Now edit the Perl script, spamshield.pl, using your favorite editor. I suggest you use a "long line" editor, such as vi. Follow these steps:

Here is a copy of my customized script:

##################################################### 
# User-defined parts below # 
##################################################### 
$log = "/var/log/maillog";  
# sendmail log location 
$lastlines=1500;  
# how many lines at the end of the log should we look at 
$spamthreshold=200;  
# this is how many mails can be seen from a single IP 
# in the last $lastlines lines in the logfile before 
# considering it spam. Adjust this to accomodate 
# busy systems and events like coming up after a 
# long downtime (when a lot of mail will be delivered 
# from various hosts or from the secondary MX) 
$dontblock="/usr/local/spamcontrol/dontblock";  
# list of IP hosts that 
# are never to be blocked 
$blockactive="/usr/local/spamcontrol/blocked";  
# these hosts are currently 
# blocked by SpamShield 
# for sysadmin review 
$blockignore="/usr/local/spamcontrol/blockignore";  
# be silent about these ones 
$securetmp="/usr/local/spamcontrol";  
# enter directory name that cannot be 
# used by anyone except the uid under 
# which this program is run 
$blackhole="209.204.146.22";  
# this **MUST** be an unused IP number on the 
# local network, or error messages and chaos 
# might ensure. undefine to not add a route, 
# this should only be used on machines with 
# known stable routing engines. 
# who will receive alerts ? undefine to stop mail alerts 
$maintainer="glenn\@networkinformation.com"; 
# define locations of programs below, systems vary 
$SENDMAIL="/usr/sbin/sendmail"; 
$TAIL="/usr/bin/tail"; 
$AWK="/usr/bin/awk"; 
$GREP="/usr/bin/grep"; 
$SORT="/usr/bin/sort"; 
$CAT="/bin/cat"; 
$DATE="/bin/date"; 
$ROUTE="/sbin/route"; 
# $WINNUKE="/usr/local/spamcontrol/winnuke";  
# define if retaliatory action desired -  
# WARNING, use WINNUKE at your own risk! 
##################################################### 
# End of user-defined parts # 
#####################################################

Testing SpamShield For the First Time

Run ./spamshield.pl as root by hand, note any and all errors encountered (usually the result of mis-defined variables), then correct them. Ensure that your variable paths are correct!

After running ./spamshield.pl for the first time, you should have the following files under the directory /usr/local/spamcontrol:

In order to correct any difficult errors, try increasing the DEBUG value.

Running from the Crontab

For optimal performance, run the program automatically every three minutes from cron, and set your system crontab to look something like this:

*/3 * * * /usr/local/spamcontrol/spamshield.pl 

On some Unix systems, you need to redirect the output of cron to /dev/null to avoid receiving emails to root each time the script is run. I typically add the following to the end of each cron line to direct the output from runlevel 2 and 1 to /dev/null:

*/3 * * * /usr/local/spamcontrol/spamshield.pl 2>/dev/null 1>/dev/null 

To Sum Up

SpamShield has taken a sensible approach to filtering spam.

Despite an array of products that claim to block spam mail, I have yet to find one that is 100 percent perfect. Most filters work to a degree, while others add yet another layer of inconvenience to the end user.

Simply put, SpamShield does what it was designed to do. As new versions evolve, I have confidence that this product will become ever more popular.

Read More About SpamShield

Log on to www.spamshield.org/ to read Kai's latest rants -- a little on Spam, a little on the rest of the world. And coming soon, version 2.0.

Glenn Graham has been working with telecommunications since 1977.


Return to ONLamp.com.

Copyright © 2009 O'Reilly Media, Inc.