Python DevCenter
oreilly.comSafari Books Online.Conferences.

advertisement


Processing Mailbox Files with mailbox.py
Pages: 1, 2, 3

Adding Messages

Messages are added by calling the mailbox's add(msg) method. The msg parameter can be one of several different types:



  • a string containing both headers and body of an RFC-2822 message.
  • a file-like object: the file will be read completely and its contents treated as an RFC-2822 message. (If the object is a disk file, it should have been opened in text mode.)
  • an instance of mailbox.Message or email.message.Message.

To copy messages from one mailbox to another, you could write:


src_mbox.lock()         # Optional -- but a good idea
dest_mbox.lock()        # Not optional!
try:
    for msg in src_mbox:
        new_key = dest_mbox.add(msg)
        count += 1
finally:
    src_mbox.close()
    dest_mbox.close()
print count, 'messages copied'

The close() method does three things: it calls the flush() method to force any unwritten changes to disk, then calls the unlock() method to free the mailbox lock, and, finally, closes any open files.

Deleting Messages

Messages can be deleted by a del mbox[key] statement or by calling the remove(key) method. The following example deletes all messages that have been marked as spam:

try:
    for key, msg in src.iteritems():
        subject = msg.get('Subject', 'No subject provided')
        if subject.startswith('***SPAM***'):
            print 'Deleting', subject
            del src[key]
finally:
    src.close()

Changing Messages

Because Message instances are newly generated every time a message is retrieved, modifying the instance doesn't affect the contents of the mailbox. To change the contents of a message, you must use dictionary-style assignment (dest_mbox[key] = new_msg) to update the message. The following example removes Re: prefixes from subject lines in a mailbox:

try:
    src.lock()
    for key, msg in src.iteritems():
        subject = msg.get('Subject', '')
        if subject.startswith('Re: '):
            msg.replace_header('Subject', subject[4:])
            src[key] = msg
finally:
    src.close()

Format-specific Message Features

Some of the mailbox formats support additional information attached to each message:

  • In the mbox and MMDF formats, messages are separated by From lines that aren't part of the message headers or body. These From lines contain the envelope sender (the sender address supplied in the SMTP transaction) and the time the message was received. (These From lines may not necessarily have the same value as the RFC-2822 From header of the message.)

    The get_from() method returns the contents of the From line, (not including the From prefix), and set_from(from_addr, [time_value]) sets a new value for the line. To write the change to disk, the modified message object must be stored in the mailbox again:

    msg = mbox_mailbox[key]            # Retrieve message
    from_ = msg.get_from()
    # Returns a value such as "amk@example.com Thu Jun 21 01:35:15 2007"
    
    # A value of True records the current time as the timestamp.
    msg.set_from('bjm@example.com', True)
    
    # Or you can supply a tuple suitable for passing to time.gmtime().
    msg.set_from('bjm@example.com', (2007, 6, 21, 1, 48, 53, 3, 172, 0))
    mbox_mailbox[key] = msg            # Store message

    Some mail readers that use mbox format follow a convention of using either the Status or X-Status fields to record which messages have been read, answered, or marked as deleted. For example, D stands for deleted messages, R for read, and A for answered messages. Multiple flags can be set on a message at the same time. The get_flags() method returns a string of characters containing the flags that have been set. The set_flags(flag_string) method takes a string and sets the specified flags, unsetting all other flags. For example:

    flags = msg.get_flags()
    if 'R' not in flags:
        # Unread message
        print msg
        msg.set_flags('R' + flags)
        mbox_mailbox[key] = msg
  • The Maildir format also supports setting single-character flags on messages, but the flag characters are different: S is for seen messages, R is for replied, and T is for trashed. The flag interface is also different for historical reasons. msg.get_flags() still returns a string containing the currently set flags, but there's no set_flags(). Instead, msg.add_flag(flag_str) sets the supplied flags and .remove_flag(flag_string) removes them.

    When using the Maildir format, messages are initially written into a tmp/ subdirectory, and once the message file has been completely written, it's moved into either the new/ or cur/ subdirectory. The get_subdir() method of a MaildirMessage instance returns the name of the subdirectory containing the message, and the set_subdir(new_dir) method records a new directory for the message. You still must store the modified message in the Maildir instance by doing maildir_mbox[key] = msg.

  • MH mailboxes support the creation of sequences, which are subsets of the messages in the mailbox. You might have one sequence that lists personal e-mails and another that contains work-related messages, for example. Sequences are identified by strings. The MH format defines a few standard sequence names such as unseen, flagged, and replied.

    Messages are added to and removed from sequences by calling add_sequence(seqname) and remove_sequence(seqname) methods on the message objects. To write the change to disk, the modified message object must be stored in the mailbox again:

    msg = mh_mailbox[key]            # Retrieve message
    msg.add_sequence('work')
    msg.remove_sequence('unread')
    mh_mailbox[key] = msg            # Store message

Acknowledgements

The author would like to thank the following people for commenting on the first draft of this article: Aahz, Tal Einat, Jeffrey C. Jacobs, and Roy Smith. Any errors are the responsibility of the author.

A. M. Kuchling has 11 years of experience as a software developer and is a long-time member of the Python development community. Some of his Python-related work includes writing and maintaining several standard library modules, writing a series of "What's new in Python 2.x" articles and other documentation, planning the 2006 and 2007 PyCon conferences, and acting as a director of the Python Software Foundation. Andrew graduated with a B.Sc. in Computer Science from McGill University in 1995. His web page is at http://www.amk.ca.


Return to ONLamp.com.



Sponsored by: