Linux DevCenter    
 Published on Linux DevCenter (http://www.linuxdevcenter.com/)
 See this if you're having trouble printing code examples


Defending Your Site Against Spam, Part 2

by Dru Nelson
07/24/2003

This article is the second and final installment describing my efforts to defend my systems from spam. The first article explains some necessary concepts and terminology. This article will dig more into the details of an actual implementation with my mail system. One thing to note is that I used qmail for my mail system (hence the title), but the information in here could apply to just about any email server in production today.

RBL Providers Today

In the previous article, I covered the history and the protocol used by network level spam defenses but not the existing landscape of RBL Providers that supply blocklists. There are quite a few of them out there, and I needed to select one for my system. At first, I polled a few friends to see what they were doing. Most had tried a blocklist at some point. I found that some people mix various blocklists and usually don't trust them enough for their corporate machines. Some switch providers periodically since the quality isn't stable. One friend had to get new blocks of IP addresses for his service since he was in the same network block as a spammer. This form of collateral damage caused him to be very negative about the whole subject.

The biggest problem with a network level defense has been the providers. Although they all conform to the protocol, they vary wildly on what goes into their blocklist database. This is a direct result of how they came to be. Almost all of them are grassroots efforts by individuals or small groups, each with different opinions and policies about how they operate. Some groups are very private and do not disclose much about their members or their policies. Some of them have very sensitive trigger fingers and others do not. Some groups are very aggressive about adding to their lists while others are extremely lax.

Without a good policy and consistent operation, I was leery of letting anybody control my mail server. It is easy to understand why it's important to be cautious about this. No one wants to disrupt their incoming mail. If an RBL provider consistently blocks the wrong hosts, it might take a while to fix. I might have to build my own special list of exceptions to their blocklist. This seemed like the kind of work I didn't want to add to my already long list of duties.

Towards the end of my research, I had one conversation that turned out to be really interesting. I used to work with a guy named Mark Fletcher at eGroups. He had been working on the spam problem for a little while and we'd come to the same conclusions. However, he had decided that he had a new twist and he started a service called Trustic to provide it.

Trustic

Fletcher decided to build a trust network for email servers on the Internet. Trust networks are becoming a popular technique for getting good results out of systems that could potentially include untrustworthy individuals (such as spammers). The best and most impressive examples of trust networks today are eBay, Google, and Advogato.

Like other trust systems, Trustic takes recommendations from registered users about IP addresses of other systems. Each user has a level of trust. Users build their trust by making accurate recommendations over time. In order for a host to become untrusted, the cumulative trust level of the recommendations has to be above a certain threshold. For example, if two well-trusted users mark a server as untrustworthy, that server would become untrusted. If a bunch of new users tried to mark a server as untrusted, it would take a large number of them. If someone abuses their trust rating, their trust level is reduced and becomes harder to raise. There are more descriptions on Trustic's web site, but that is the gist of the technique.

The trust system provides a blocklist that allows the many good users of the Internet to mark things as trusted or un-trusted very quickly. Unlike a lot of other RBL providers, Trustic also added a good web interface, email reports, and other features that make this system easy to use. As a result of these policies and the web interface, it was easy to trust blocking email to Mark's system. As an added bonus for me, he incorporated a lot of my feedback into the design before he fully launched in January of 2003.

To use his service, I logged in and got an account. The registration gave me a number to use when making an IP4R query of the system, when forwarding spam to the Trustic system, or when sending and receiving recommendations. The number allows Trustic to tailor its responses to the particular user. Each user can have his own blocklist rules. Originally, Trustic relied on the registrant's IP address for submissions or queries. This didn't work with the dynamic IP addresses assigned by most ISPs.

The Wonderful World of qmail

Now that I have a provider for blocklist information, how do I use it with my mail server? Well, before we get to that, we must go over qmail a bit in case there are any readers who aren't yet comfortable with qmail configuration. Once that little step is out of the way, it is a lot easier to understand the actual steps involved to integrate with Trustic or some other IP4R provider. In fact, it took me much more time to write this article than to install and test this setup. (If you don't run qmail, you can skip this section, but if you are curious, it shouldn't be a hard read.)

qmail, written by Dan Bernstein (DJB), follows a few simple principles in its design. Once you understand these, understanding what I'm about to describe becomes much easier. In order of priority, DJB wanted qmail to be secure, extensible, reliable, and fast. One of the first design decisions was to divide the email system into several smaller programs. This is the same successful technique that the original Unix architects used when building a text processing system from commands like grep, cut, or cat. Each small program that makes up qmail does just one particular task and usually runs as a less-privileged user. Each tries to do the absolute minimum required for that particular task. This keeps the programs from "becoming too many things to too many tasks", which usually causes programs to look like "spaghetti code".

Because the programs and code are smaller and simpler, the code is easier to both debug or audit. Also, when the programs do interact with each other, they use a simple pipeline API. Such a system allows you to insert or remove different pieces from the pipeline for your specific situation. Finally, the programs are designed to be efficient with the operating system's resources (CPU and memory). This has a nice side effect of making the system fast even on older systems.

To summarize,

  1. The programs do the minimum necessary for their roles, so they remain small and secure
  2. The programs conform to a pipeline dataflow model for extensibility and security containment
  3. Efficient resource usage keeps them small and fast

As a result of qmail's design, it was an easy choice for many of the Internet services I worked at in the past. It was easy to scale the package to handle enormous mail loads. It was free and it worked on our favorite operating systems. It also won the security contest hands down compared to any other mail package. Naturally, I would end up running the system for my own personal mail system. I have been running the same code for about 5 years without recompiling. I can't remember how many times I've seen other mail systems get listed on CERT or Bugtraq. I've never had any issues with qmail, and I like that a lot. It just runs and runs.

Incoming SMTP Mail

Now that we have some of that light background on the philosophy of qmail, let's see how it expresses itself through an incoming email.

qmail, like most email systems, is all about moving email into and out of various queues for delivery. qmail has one queue where it receives all incoming email to be analyzed for delivery. Email is injected into that queue via a program called qmail-queue. True to the DJB philosophy, that program insures that email properly gets into the incoming queue safely and securely. It will not respond with a success code unless every step went properly. It is the responsibility of other small programs to receive mail via SMTP, QMQP, or the command line, and then properly feed the mail to qmail-queue. For our case, we're only concerned with the SMTP handler, since that is how systems receive mail from the internet. The specific qmail program that handles an SMTP transaction is qmail-smtpd.

Since qmail programs are designed to do the minimum necessary, the qmail-smtpd program just handles the SMTP conversation on a socket. To be clear, it will not setup the socket for listening or wait for connections. It relies on some other program to do those things. This is an interesting design choice because it allows us to insert other programs to perform checks in front of qmail-smtpd.

In the past, most servers on Unix would use the inetddaemon to setup and listen on sockets. qmail, being a program from that era, was probably just conforming to that norm. These days, it is rare for a server to use an inetd since it is poor at handling lots of connections or a hostile Internet. The interesting thing here is that the older design style allows us to add new filtering capabilities to qmail without having to change any of the existing qmail code. So in order to solve the inetd deficiencies and to keep the system extensible, DJB wrote another small program called tcpserver. Its sole purpose is to perform socket setup, filtering, and listening. It has several parameters for setting TCP options, checking a simple list of IP addresses to block, or performing anti-spoofing checks. When a connection does arrive, tcpserver sets some environment variables and starts the program specified on it's command line. The main environment variable we are concerned with is TCPREMOTEIP,which contains a string representation of the remote host's IP address.

Normally, in a qmail installation, tcpserver will run qmail-smtpd as its next program in the pipeline. However, we want to filter our incoming SMTP connections before we accept an email. Therefore we need something to query our Trustic account via IP4R before we accept an email via TCP. As luck would have it, DJB wrote a program to do just this. That program is called rblsmtpd.

The rblsmptd program, as you'd expect, does one basic thing. It checks the incoming connection using IP4R. If all goes well, it runs a program specified on its command line. This is that same basic pipeline API of handing off the socket. In order for rblsmtpd to do its job, it checks the TCPREMOTEIP environment variable, making an IP4R query against some system specified on the command line. Depending on the outcome of that and the influence of a few command line arguments, it will either end the SMTP transaction or run the next program. In our case the next program to run will be the qmail-smtpd.

rblsmtpd has some important command line arguments that have important semantics, so I'll cover them in detail here. (These descriptions are straight from the documentation). They handle how your system deals with an IP4R request success or failure. That success or failure in turn determines how and when mail will be rejected. Obviously, this is important.

Command line options for rblsmtpd:

For my system, I chose the following change to my tcpserver's startup line:

rblsmtpd -b -c -r 1234567.query.trustic.com \
	/var/qmail/bin/qmail-smtpd

Primer on Mail Bounces

In order to understand the effects of these, let met give a quick primer on mail rejection or bouncing. There are really only two forms of mail bounces, in layman terms, "Hard Bounce" and "Soft Bounce". A Hard Bounce causes the sending mail system to give up immediately on delivering an email, generating an error email or "bounce" to be delivered to the sender. A Soft Bounce causes the sending mail system to give up temporarily on delivering an email. The sending mail server may then try again after an hour or some other specified timeout. If the mail continues to get a "Soft Bounce" code from the receiving mail server for some specified time (between two days and a week), the sending mail server will give up and deliver a bounce message. In both cases, the sending mail server takes responsibility for generating or delivering the bounce.

In the above configuration, I chose the -b and -c command line parameters. With -b, I'm choosing to cause a Hard Bounce to any email from an untrusted host. This is important because I want senders to know immediately if there is a problem. With the -c parameter, I'm choosing to Soft Bounce if Trustic should happen to have any downtime. I wouldn't want to allow spam into my system just because of a small outage and I don't mind delaying that email. Besides, I monitor my systems so I could change this if it ever happened for a long time. If someone were blocked, they would know immediately that the email delivery failed.

What Would a Blocked Sender See?

Finally, Trustic provides the optional TXT record that the IP4R protocol specifies (see previous article on the IP4R protocol). The rblsmtpd program will include that message in the SMTP error code. The sending mail servers will then copy that SMTP error into the bounce message to the sender. Trustic puts a URL into the message so the sender can go to a site and read the details on why the block occurred.

Here is an example from my logs:

553 Message rejected, please see
	http://www.trustic.com/help/bounce?ip=200.163.45.155

As always, test your setup once you have it set up. You wouldn't want to lose mail for half a day because you didn't test something.

Flipping the Switch

After I set the system up, not much happened. I sent recommendations and only some of them blocked spam. The service was too new to be effective. However, after about fifty users joined, things started to happen. Soon I was blocking about 8 to 10 spam a day without making those recommendations. After a few weeks, the real test came. I got hit with 1500 email attempts to my server. The attack wasn't as severe as the previous one, but it could have caused the same problems. This time, however, it produced zero load on my system. I was dropping all of the requests. I didn't even know it was occurring until I checked my logs for that day. Success!

After that, another interesting thing happened. After a while, my recommendations were having a good effect on the other mail systems using Trustic. I could see in the reports that some of my recommendations were blocking hundreds of spam emails from being delivered to other systems. It felt really good to give back to the others that helped block spam for me.

Conclusion

I hope these two articles have been achieved their goal of providing some good coverage of network level spam defenses. From my own recent experiences, I have seen how the use of some simple, existing protocols and a trust network could become a serious deterrent to the new spam attacks. In the future, these defenses may protect us from more than just spam. I look forward to seeing more people joining in and applying these systems to their own networks.

Resources

Dru Nelson has been on the Internet since 1988. After starting an ISP in Florida, he moved to the San Francisco Bay area and has been involved with large Internet infrastructure at companies like Four11 (Yahoo Mail), eGroups (Yahoo Groups), and Plaxo. He is now the CTO and co-founder of BrightRoll.com.


Return to the Linux DevCenter.

Copyright © 2009 O'Reilly Media, Inc.