Postfix: The Definitive Guide

, author of Postfix: The Definitive Guide

Postfix provides a number of ways to get information that can be helpful when you are trying to diagnose a problem. The standard logging often provides more than adequate information for understanding problems and determining their cause. This article discusses Postfix logging in general, how to find all of the relevant information in the logs, and finally, how to increase the amount of logging when more information is needed. Before we get to logging, let's consider a few configuration guidelines that might help prevent problems from the start.

Configuration Guidelines

Change One Thing at a Time

Postfix is a complex system. If you make several changes and your system stops working as expected, you'll have a difficult time figuring out which change caused the problem.

Save Known Working Configurations

Be sure that you can always revert to an earlier known working configuration. Either make copies of files before you work on them or, even better, use version control with your configuration files.

Use the soft_bounce Parameter for Testing Changes

When trying out a configuration change you're not sure about, you can enable soft_bounce, so that mail that would bounce is kept in the queue and can still be delivered once the problem is corrected. In your main.cf file, simply add the following:

soft_bounce = yes

Besides queuing problem messages, enabling soft_bounce causes Postfix to convert permanent rejection codes (5xx) to temporary ones (4xx). When you enable soft_bounce, it's important to check your logs frequently. When you find a warning or see that a message has been rejected, you can fix your configuration problem and successfully receive the message next time the delivery is attempted.

Use warn_if_reject with Relay Restrictions

Chapter 11 of Postfix: The Definitive Guide discusses anti-spam configuration information. As you build up your set of blocking rules, you can preface those you're not sure about with warn_if_reject to have the rule log a warning instead of rejecting the message.

For example, a restriction such as reject_unknown_hostname rejects too many legitimate messages, in most situations. If you want to see what effect it will have in your environment without actually rejecting messages, preface it with warn_if_reject:

smtpd_helo_restrictions = permit_mynetworks
   warn_if_reject reject_unknown_hostname
   ...

Messages that would have been rejected by the restriction are allowed to go through, but a warning is logged. As with soft_bounce, you have to watch your log file to know what effect a warn_if_reject-modified restriction will have.

Postfix Logging

Whenever something is not working as expected, the first step for troubleshooting the problem is checking your log. Postfix logs all operations whether successful or not. Postfix uses your system's syslog facility to write messages to your log file. Because system logging differs on different platforms, you should review your system documentation to learn about syslog. Most likely, you can find where Postfix logging occurs by checking your system's /etc/syslog.conf file. In the examples below, we'll assume that the log file is located at /var/log/maillog.

If Postfix doesn't seem to be logging anything, read over your syslogd documentation carefully. A common problem is that many syslogd implementations won't create a log file if it doesn't already exist. If your system logger is configured to write mail log entries to a file that doesn't exist, you should first create the file and then restart your logging daemon.

Another problem new administrators sometimes encounter is that some information is missing from the log. If any of your Postfix components are running chrooted, you have to take special steps to be sure that they can write entries to your log. In particular, the syslog socket file must be available within the Postfix chroot directory. If you are not familiar with Unix chroot, how it works, and what system resources Postfix needs, you're better off not using this feature. chrooting is configured in the master.cf file and is discussed in Chapter 4 of Postfix: The Definitive Guide.

Postfix provides a tremendous amount of valuable information as it runs. Get in the habit of reviewing your logs frequently. Look particularly for errors and warnings. Unix utilities such as grep, awk, and Perl are invaluable tools to manipulate, filter, and extract precise information from your log files. You can run the following simple command regularly to pull out any log entries reporting a problem.

$ egrep '(reject|warning|error|fatal|panic):' /var/log/maillog

Since Postfix does not run as a single process, but rather as several different components, each handling a different aspect of mail delivery (see Chapter 2 of Postfix: The Definitive Guide), each component writes its own log entries. For example, when Postfix is initialized, the master daemon records that it is starting up.

Nov 22 09:07:03 scallop postfix/master[932]:
 daemon started -- version 2.0.16

All Postfix components log entries following the same general pattern. The figure below shows the parts of the master initialization log entry broken into its components.

Troubleshooting with Postfix

The timestamp and hostname portions are obvious enough and I'll leave them out to simplify examples in the rest of this article. The Postfix component identification includes its process ID in square brackets. The last portion of the entry is the message itself and is basically freeform. Each component writes its own relevant information. Browsing through your own mail log entries should give you a good idea of the activity of your Postfix server.

When you are trying to solve a particular problem, you may need more than just one line from your log file. Often newcomers to Postfix don't look at the complete picture in the log file when trying to understand why something doesn't work or why a message wasn't delivered as expected. The next two sections discuss how to get the whole picture from your logs.

master Doesn't Know Everything

You may find messages reported by the master daemon indicating a failure that occurred earlier and wonder why they're not more informative:

postfix/master[3158]: warning:
 process /usr/libexec/postfix/smtpd pid 3163
 exit status 1
postfix/master[3158]: warning:
 /usr/libexec/postfix/smtpd:
 bad command startup -- throttling

Since the master daemon launches other Postfix components, its warnings report that a particular component had a problem, but the master daemon doesn't know what the problem was. In this example, you can see that it's the smtpd daemon, running under process ID 3163, that actually encountered a problem. With this information, you can find the error as reported by the component itself. More than likely, you will find earlier entries in your log file to see the description of the error from the component that reported it. Otherwise, grep can give you the exact lines you're looking for:

$ grep 3136 /var/log/maillog
postfix/smtpd[3163]: fatal: open database /etc/postfix/aliases.db:
 No such file or directory

In this case, the message clearly reports that a missing file, /etc/postfix/aliases.db, is the cause of the problem.

The Queue ID Tells All

If you want to know the fate of any particular message handled by Postfix, your log has the answer again. When a message enters the Postfix system, whether over the network or locally, it is immediately assigned a queue ID. You can trace its complete journey by looking at all the log entries that contain the queue ID. For example, when a client connects to your server to deliver a message, the smtpd daemon logs the following.

postfix/smtpd[1677]: connect from
 mail.example.com[192.168.100.45]
postfix/smtpd[1677]: 9634B16F7F7:
 client=mail.example.com[192.168.100.45]

The first line records the initial connection. The next line shows you the queue ID that has been assigned to this message, 9634B16F7F7. When other Postfix components handle the message, they reference the same queue ID and record it with their logging entries. To find out what happened to this message in your system, pull out all of the log entries that include its queue ID:

$ grep 9634B16F7F7 /var/log/maillog
postfix/smtpd[1677]: 9634B16F7F7:
 client=localhost[127.0.0.1]
postfix/cleanup[1679]: 9634B16F7F7:
 message-id=<20031227171249.9634B16F7F7@scallop.example.com>
postfix/qmgr[1673]: 9634B16F7F7:
 from=<kyle@oreilly.com>, size=368,
 nrcpt=1 (queue active)
postfix/smtp[1681]: 9634B16F7F7:
 to=<kdent@example.com>, relay=none,
 delay=46, status=deferred
 (connect to 216.150.150.131[216.150.150.131]:
 No route to host)

In this case, the smtp client tells us that it could not relay the message to its next hop because of a networking problem. Viewing this message in the queue shows the same queue ID:

$ mailq
Queue ID- --Size-- ----Arrival Time---- -Sender/Recipient-------
9634B16F7F7      346 Sun Nov 22 09:01:08  kyle@oreilly.com
             (connect to 216.150.150.131[216.150.150.131]:
	      No route to host)
                                         kdent@example.com

A successfully delivered message looks like the following:

postfix/smtpd[1434]: 4F9D116F7F7: client=localhost[127.0.0.1]
postfix/cleanup[1436]: 4F9D116F7F7:
 message-id=<20031228154034.4F9D116F7F7@scallop.example.com>
postfix/qmgr[1431]: 4F9D116F7F7: from=<kyle@oreilly.com>,
 size=344, nrcpt=1 (queue active)
postfix/smtp[1438]: 4F9D116F7F7: to=<kdent@example.com>,
 relay=mail.example.com[216.150.150.131], delay=11,
 status=sent (250 Ok: queued as 51DCA20DD70)

In this case, the message was accepted by the host mail.example.com. Once delivered, the message becomes the responsibility of the receiving system.

Increasing Information

If you experience a problem with a particular site or sites but find no errors in your log, you can have Postfix increase the level of logging for the problem site with debug_peer_list. Postfix will then record additional information, such as the entire conversation between the client and the server. The parameter takes a list of sites for which Postfix should increase the logging level according to the value set in debug_peer_level. To get more information on problems with a site that receives mail for the example.com domain, set the parameters as follows.

debug_peer_list = example.com
debug_peer_level = 2

After reloading Postfix and exchanging mail with the site, check your log to see what transpires between your site and example.com. You can increase debug_peer_level to get increasingly more information logged.

To increase logging for a particular component, add the -v option to entries in master.cf. For example, if you want to see more information when Postfix sends messages over the network, add -v to the smtp component. You can add multiple -v options to increase the level of logging.

smtp     unix  -     -     n     -     -     smtp -v -v

Because increased logging can generate a tremendous amount of information, don't set high logging levels during your normal operation. Increase your logging levels temporarily to solve a problem and then restore your configuration to normal logging levels once you have resolved the issue.

If you run into mysterious problems that aren't explained by log entries, Postfix makes it easy to run with a debugger. Problems with your hardware or kernel might manifest themselves in Postfix since it is a long-running, active program. You can add the -D option to components in master.cf. See the DEBUG_README file that comes with the Postfix distribution for detailed information on running tracers and debuggers with Postfix programs.

Email errors may seem mysterious when they crop up, but having good information is the key to understanding your problem. Working with the information Postfix provides, you can quickly isolate and solve problems.


O'Reilly & Associates recently (in December 2003) released Postfix: The Definitive Guide.


Return to ONLamp.com.