SpamShield: A Perl-Based Spam Filter for sendmailby Glenn Graham
Numerous articles have focused on ways to recognize and block spam email. As system administrators work to build sophisticated roadblocks, spammers continue to
find ways to knock them down. This article will focus on one viable solution,
SpamShield version 1.40 -- a Perl-based spam filter for
sendmail. We'll cover how it works and how to install and configure it on your server.
The science of spam (if you can call it that) has taught us one thing: spam leaves a definite "calling card" in the system logs. This calling card is generally repetitive enough that the process of tracking spam may be automated. Based on this theory, a brilliant programmer by the name of Kai Schlichting wrote a Perl-based program called SpamShield.
United States federal and state laws now prohibit the transmission of spam, but these are only laws, and where laws exists, so do criminals intent on breaking them. We must also remember that U.S. laws only apply in the U.S., not in other countries. The Internet is a global playing field, and laws concerning its use are created and enforced on a country-by-country basis. For example, in China, India, and Romania, spamming is legal.
Spam email usually originates from profit-based organizations that purchase "spam lists" from "list sellers" located in various corners of the Internet. Legitimate and illegitimate companies make use of them. I've received email offers promising everything from credit repair to instant university diplomas (I don't require credit repair and I already have my diploma -- thanks). Some of the more legitimate companies include credit card firms, car manufacturers, and drug companies.
All too often, you may see a message at the bottom of the spam that says something like this:
"If you would rather not receive these messages, please click here. It will take up to 48 hours for your request to take effect. All third party products and services promoted on this Site are offered exclusively by third party advertisers. XXX Company makes no representations or warranties with respect to these offers and all claims for injury and damages related to such offers are the sole responsibility of the advertiser."
Talk about integrity.
How Do They Find You?
There are endless ways to seek out email addresses on the Internet. In most cases, list suppliers generate their lists by using computers that scan Web sites and databases filtering the "
@" symbol. Next, each address is verified using SMTP verification software, then compiled into a portable database.
In other cases, requesting that you be removed from a list (as in the example above) verifies that your address exists; hence, you're added to more lists.
The Open Relay Server
The most common method used to send spam is through an "open" relay host. An open relay is simply an SMTP server that allows any domain to connect on port 25, and relay through to another domain. The engineers at sendmail.org have worked for several years to find ways to reject relaying, using filtering methods such as the access database.
Newer versions of
sendmail do a reverse domain lookup before allowing mail to
pass. If the incoming domain doesn't exist,
sendmail will typically reject the
message. This prevents spam from sources that use nonexistent domains in their
How SpamShield Works
The basic principle of SpamShield is fairly straightforward. First it gathers a "chunk" of log information and builds that into a volume of its own. Next, based on a predetermined threshold value, the software decides if the volume contains more than the allowable amount of email originating from any single source (such as "spamdomain.com"). Once the allowable value has exceeded the threshold, SpamShield simply blocks that source from further access.
Through experience, I learned that setting the threshold value is the most important part of making SpamShield run efficiently.
Here is a definition of how SpamShield works, appended from Kai's readme document:
"SpamShield looks at the last <n> lines of the sendmail logfile (maillog), and builds a list of mail volume received from various hosts (by IP) in the period covered by that log fragment. If any particular machine sends more mail than the configured global threshold, the assumption is that spam is received. The IP address is then dropped to a "dead host" (an unused IP address within your netblock). The defaults for the log file fragment and the allowable number of mails per host are for a small system with only a few thousand mails per day. You might want to adjust those limits to avoid false positives. (see set $spamthreshold ). The general assumption is that spam abuse typically means that up to several hundred emails PER MINUTE are received from a single source: this is a tremendous 'signal to noise' ratio, given that even very large systems, such as AOL's mail servers, don't deliver more than a few hundred mails to a small/medium-sized system per day. For this reason, there are configuration options to ignore 'spam-like' traffic from high-traffic hosts that are deemed secure and non-relaying (AOL's servers don't relay, for example)."
Installing the Software
Installation of Kai's software is simple. SpamShield is a Perl script, so you'll need Perl 5, available from Perl.com.
First, download the tarball from Kai's site to your src directory, then untar it. The uncompressed directory structure will look like this:
spamcontrol/ spamcontrol/blocked spamcontrol/INSTALL-spamshield spamcontrol/spamshield.pl spamcontrol/dontblock spamcontrol/blockignore
Next, move the spamcontrol directory somewhere more convenient, such as /usr/local/spamcontrol:
Command: mv ./spamcontrol /usr/local/spamcontrol
The Perl script, spamshield.pl should be mode 700, and owned by
Command: chmod 700 ./spamshield.pl ; chown root:wheel spamshield.pl
Please review ./INSTALL-spamshield, located in the root directory, for a detailed installation overview.
Pages: 1, 2