Processing Mailbox Files with mailbox.py
Pages: 1, 2, 3
Adding Messages
Messages are added by calling the mailbox's add(msg) method. The msg parameter can be one of several different types:
- a string containing both headers and body of an RFC-2822 message.
- a file-like object: the file will be read completely and its contents treated as an RFC-2822 message. (If the object is a disk file, it should have been opened in text mode.)
- an instance of
mailbox.Messageoremail.message.Message.
To copy messages from one mailbox to another, you could write:
src_mbox.lock() # Optional -- but a good idea
dest_mbox.lock() # Not optional!
try:
for msg in src_mbox:
new_key = dest_mbox.add(msg)
count += 1
finally:
src_mbox.close()
dest_mbox.close()
print count, 'messages copied'
The close() method does three things: it calls the flush() method to force any unwritten changes to disk, then calls the unlock() method to free the mailbox lock, and, finally, closes any open files.
Deleting Messages
Messages can be deleted by a del mbox[key] statement or by calling the remove(key) method. The following example deletes all messages that have been marked as spam:
try:
for key, msg in src.iteritems():
subject = msg.get('Subject', 'No subject provided')
if subject.startswith('***SPAM***'):
print 'Deleting', subject
del src[key]
finally:
src.close()
Changing Messages
Because Message instances are newly generated every time a message is retrieved, modifying the instance doesn't affect the contents of the mailbox. To change the contents of a message, you must use dictionary-style assignment (dest_mbox[key] = new_msg) to update the message. The following example removes Re: prefixes from subject lines in a mailbox:
try:
src.lock()
for key, msg in src.iteritems():
subject = msg.get('Subject', '')
if subject.startswith('Re: '):
msg.replace_header('Subject', subject[4:])
src[key] = msg
finally:
src.close()
Format-specific Message Features
Some of the mailbox formats support additional information attached to each message:
-
In the mbox and MMDF formats, messages are separated by From lines that aren't part of the message headers or body. These From lines contain the envelope sender (the sender address supplied in the SMTP transaction) and the time the message was received. (These From lines may not necessarily have the same value as the RFC-2822 From header of the message.)
The
get_from()method returns the contents of the From line, (not including the From prefix), andset_from(from_addr, [time_value])sets a new value for the line. To write the change to disk, the modified message object must be stored in the mailbox again:msg = mbox_mailbox[key] # Retrieve message from_ = msg.get_from() # Returns a value such as "amk@example.com Thu Jun 21 01:35:15 2007" # A value of True records the current time as the timestamp. msg.set_from('bjm@example.com', True) # Or you can supply a tuple suitable for passing to time.gmtime(). msg.set_from('bjm@example.com', (2007, 6, 21, 1, 48, 53, 3, 172, 0)) mbox_mailbox[key] = msg # Store messageSome mail readers that use mbox format follow a convention of using either the Status or X-Status fields to record which messages have been read, answered, or marked as deleted. For example, D stands for deleted messages, R for read, and A for answered messages. Multiple flags can be set on a message at the same time. The
get_flags()method returns a string of characters containing the flags that have been set. Theset_flags(flag_string)method takes a string and sets the specified flags, unsetting all other flags. For example:flags = msg.get_flags() if 'R' not in flags: # Unread message print msg msg.set_flags('R' + flags) mbox_mailbox[key] = msg -
The Maildir format also supports setting single-character flags on messages, but the flag characters are different: S is for seen messages, R is for replied, and T is for trashed. The flag interface is also different for historical reasons.
msg.get_flags()still returns a string containing the currently set flags, but there's noset_flags(). Instead,msg.add_flag(flag_str)sets the supplied flags and.remove_flag(flag_string)removes them.When using the Maildir format, messages are initially written into a
tmp/subdirectory, and once the message file has been completely written, it's moved into either thenew/orcur/subdirectory. Theget_subdir()method of aMaildirMessageinstance returns the name of the subdirectory containing the message, and theset_subdir(new_dir)method records a new directory for the message. You still must store the modified message in theMaildirinstance by doingmaildir_mbox[key] = msg. -
MH mailboxes support the creation of sequences, which are subsets of the messages in the mailbox. You might have one sequence that lists personal e-mails and another that contains work-related messages, for example. Sequences are identified by strings. The MH format defines a few standard sequence names such as unseen, flagged, and replied.
Messages are added to and removed from sequences by calling
add_sequence(seqname)andremove_sequence(seqname)methods on the message objects. To write the change to disk, the modified message object must be stored in the mailbox again:msg = mh_mailbox[key] # Retrieve message msg.add_sequence('work') msg.remove_sequence('unread') mh_mailbox[key] = msg # Store message
Acknowledgements
The author would like to thank the following people for commenting on the first draft of this article: Aahz, Tal Einat, Jeffrey C. Jacobs, and Roy Smith. Any errors are the responsibility of the author.
A. M. Kuchling has 11 years of experience as a software developer and is a long-time member of the Python development community. Some of his Python-related work includes writing and maintaining several standard library modules, writing a series of "What's new in Python 2.x" articles and other documentation, planning the 2006 and 2007 PyCon conferences, and acting as a director of the Python Software Foundation. Andrew graduated with a B.Sc. in Computer Science from McGill University in 1995. His web page is at http://www.amk.ca.
Return to ONLamp.com.