AddThis Social Bookmark Button

Print

Mail-Filtering Techniques
Pages: 1, 2

Survey of Mail Filtering Techniques

Various filtering techniques have been invented to work around spam floods. In turn, spammers invented various techniques to work around the anti-spam techniques. Here are the most used anti-spam workarounds:



Local Blacklists

This is probably the first technique ever used. The system administrator maintains a list of spammers' IP addresses.

On the pro side, it's useful against open relays and new spammers that don't yet know how to distribute their attacks.

On the con side, blacklists are hard to maintain. Spammers are a fast-moving target. They typically don't reuse the same IP twice in a row when sending spam to a domain.

Distributed Blacklists

The next step in the spam war was distributed blacklists. Sites can share blacklists and they are usually implemented through DNS. Many sites will reject messages from an IP after it appears in a distributed blacklist.

On the pro side, this is simple to use.

There are a few cons, however:

  • You're trusting someone else's idea of what is spam.

  • Blacklists can be poisoned with wrong information: spammers can spread viruses to send spam through an ISP's SMTP server to cause it to appear on a blacklist, though you really want to accept the mail from there.

  • Distributed blacklists are susceptible to Distributed Denial of Service (DDoS) attacks.

Whitelists

Some heavily spammed sites refuse anything except what comes from friendly IP addresses.

On the pro side, this removes 100 percent of the spam from other IP addresses.

On the con side, it cuts you off from potentially legitimate people not on the whitelist.

Content Filtering and Bayesian Filtering

While all the previous techniques rely on lists of IP addresses, content filtering tries to identify the message as spam by analyzing the content. Spam messages usually contain commercial messages and forged sender information, which makes classification possible. Bayesian filtering uses feedback from users about what is and isn't spam, and tries to score words as spammish or non-spammish. It then scores messages based on how spammish they look.

The pro argument is compelling. This technique is very promising, because it can adapt to individual views of what is spam. Moreover, because of the learning approach, the filters can evolve if the nature of the spam changes.

On the con side, spammers quickly learned to work around Bayesian filtering by inserting "positive" words into their messages, and by masquerading bad words. Fighting on this front would ultimately require the filter to do semantic analysis of messages, something that is not really practical yet.

PGP/PKI

This is an extreme measure -- "don't accept any mail unless a trusted PGP key has signed it or if the signer's key is in an trusted PKI repository."

It has a big pro, in that it removes 100 percent of the anonymous spam. (Unless spammers invade PGP keyrings or steal keys, it's very handy.)

On the con side, you can only receive messages from people that sign their emails.

Per-Recipient Addresses

This scheme uses dozens of email addresses, a different one for each person or entity with whom you exchange mail. When you start receiving spam on an address, you drop it.

On the pro side, if you receive spam on one address, you have a pretty good idea who sold your address.

There are two cons:

  • There's quite a bit of overhead on new legitimate senders to send you a message.

  • It's a pain to manage when exchanging messages with many people.

Sender Acknowledgment

Every time you receive an email, a robot handles it. The robot queues the message and sends a challenge to the sender. The challenge is usually just a message with a cookie that asks the sender to reply to confirm that she actually sent the message. When the robot receives the acknowledgment, it delivers the original mail to you.

The pros seem good. Fans claim that this removes 100 percent of spam. And it does tend to hide the fallout when people forge your address on spams and viruses.

The cons are many, however:

  • The sender needs to send an acknowledgment -- "Yes, I really did send you a message!" -- message, which is a bit rude.

  • When you receive viruses with forged sender addresses, someone that has nothing to do with the sender will receive a acknowledgment request.

  • Messages with the wrong sender address will spawn undeliverable junk acknowledgment requests.

  • This causes extra delay on legitimate mail delivery.

  • You cannot use this technique on messages sent from other robots, such as errors from mail servers.

Real-Time Sender Address Checking

When a message arrives, the mail servers try to validate the sender address before accepting it. This validation attempts to send a message to the sender address. If the mail server of the sender address' domain responds with an invalid address error, the server can reject the original message.

The pro of this approach is that it can remove or reduce forgery.

The con is that some servers will accept a message even if the address is invalid, rejecting the message later.

Greylists

The idea of greylists is that spammers never try to resend a message if they receive a temporary failure error. When the mail server receives a message, it refuses with a temporary error and remembers the delivery attempt for the recipient email address, source email address, and source IP. The next time the sender server attempts to send the message, the destination server will accept it. If the message was spam, then the sender will probably never try to resend it.

A server that does greylisting may also refuse a message until some time has elapsed since the first attempt. This forces spammers to stay at the same IP address for a while before the receiver will accept their junk mail.

There are two pros:

  • As of today, it removes 99 percent of the spam with no false positives.

  • Spammers trying to slip junk past greylisting servers will have to keep the same address for some time, thus improving the efficiency of blacklists.

The con is that it introduces some delay on legitimate mail delivery.

Sender-Permitted Framework (SPF)

SPF is not a spam filtering technique. It is an anti-forgery technique. Using SPF, a domain can publish the list of machines that can send email on behalf of the domain. The list can be closed (hosts not listed by SPF records may not send legitimate mail from the domain), or open (hosts not listed by SPF records may send legitimate mail from the domain).

SPF will never stop the forgery of domains that don't implement SPF. SPF can be used as a tool to reduce the effect of other filtering techniques. For example, you can skip greylisting for SPF-compliant senders.

Emmanuel Dreyfus is a system and network administrator in Paris, France, and is currently a developer for NetBSD.


Return to ONLamp.com.