Python DevCenter
oreilly.comSafari Books Online.Conferences.

advertisement


Processing Mailbox Files with mailbox.py
Pages: 1, 2, 3

Headers are accessed by treating the Message object like a dictionary. The Message object preserves the case of header names, but headers are retrieved case-insensitively. Some usage examples:



print 'Number of headers:', len(msg)

# Retrieve the message ID
msg_id = msg['Message-ID']
print msg_id

# Equivalent: retrieval is case-insensitive.
msg_id = msg['message-id']
print msg_id

# Retrieve subject header, with a default value if
# the header isn't present.
subject = msg.get('Subject', 'No subject provided')

# Retrieve Cc header, returning None if it's not present.
cc = msg.get('Cc')

# Check if a header is present
if 'X-Virus-Scan' not in msg:
    print 'Doing virus scan...'
    # Add header value
    msg['X-Virus-Scan'] = 'OK'

There can be multiple header lines using the same field name; the "Received" header is the most common example. When there are multiple header lines, the get() method will return a single arbitrarily chosen line. The get_all() method returns a list of all header values. set() never overwrites or deletes existing lines; it will always add a new header line.

Here are some examples using the Received header:

# Get list of received headers
recv_trail = msg.get_all('Received')
for line in recv_trail:
    print line

# Add a new received line; this line will come
# last when the message headers are converted to
# a string.
msg['Received'] = 'from host1 by host2'

# Delete all received headers
del msg['Received']

# Replace the Subject header
msg.replace_header('Subject', '***SPAM*** ' + subject)

See the email package's documentation for a full list of methods and attributes.

Example: A Mailbox to RSS Converter

Putting everything together for an example, the following script uses the mailbox module and Andrew Dalke's PyRSS2Gen to generate an RSS feed from a mailbox.

#!/usr/bin/env python2.5

import sys, mailbox, datetime
from email import utils
import PyRSS2Gen

if len(sys.argv) == 1:
    print 'Usage: %s <maildir-1> <maildir-2> ...' % sys.argv[0]
    sys.exit(1)

# Create RSS feed
feed = PyRSS2Gen.RSS2(title='Mailbox feed',
                      link='http://maildir-feed.example.com',
                      description=('Contains mailboxes: ' +
                                   ' '.join(sys.argv[1:])
                                  ))

# Loop over specified mailboxes
for filename in sys.argv[1:]:
    mbox = mailbox.Maildir(filename)
    for msg in mbox:
        subject = msg.get('Subject', "")
        guid_hdr = msg['Message-ID']

        # Parse the date, turning it into a datetime object.
        date_hdr = msg.get('Date')
        if date_hdr is None:
            date = datetime.datetime.now()
        else:
            (y, month, d,
             h, min, sec,
             _, _, _, tzoffset) = utils.parsedate_tz(date_hdr)
            date = datetime.datetime(y, month, d, h, min, sec)


        # Create RSS item and add it to the feed
        item = PyRSS2Gen.RSSItem(pubDate=date, title=subject,
               guid=PyRSS2Gen.Guid(guid_hdr, isPermaLink=False))
        feed.items.append(item)

# Write generated RSS to stdout
feed.write_xml(sys.stdout, encoding='utf-8')

Modifying Mailboxes

The examples so far have only examined mailboxes without changing their contents. Let's look at how to add, change, and remove messages from a mailbox.

Locking

Before making any alteration to a mailbox, always call the mailbox's lock() method to acquire a lock on the mailbox. When the changes are complete call the flush() method to write changes to disk and the unlock() method to release the lock on the mailbox.

Different mailbox classes will make changes to the underlying disk files at different times. For the single-file mailbox formats, new messages are added immediately but deleted messages aren't removed until you call flush(). On the other hand, directory-based formats, such as Maildir and MH, make all their changes immediately and the flush() method doesn't actually do anything. Thanks to Maildir's lock-free design, lock() and unlock() also don't have to do anything.

It's good practice to always call these methods, even if some or all of these methods are no-ops. Someone might come along and modify your code, or pass in a mbox object where you're expecting a Maildir object. People are very protective of their e-mail, so you should always be careful to avoid duplicating or worse, deleting messages.

Pages: 1, 2, 3

Next Pagearrow





Sponsored by: