Creating Custom Email Queries
Pages: 1, 2
Understanding the Job Ahead
If I've learned one thing, it's that you should never remove somebody else's code from an existing application. Adding new source code is a safer technique; I've found that I can still do what I want and avoid the monumental effort of understanding somebody else's work. It also makes it much easier to port my work to newer versions of the application.
You can do two things to help understand how DBmail works:
- Create an entity relationship diagram for the PostgreSQL database.
- Configure PostgreSQL to echo SQL commands to syslog and see what's going on.
An entity relationship diagram, or ERD, is a graphical representation of the database. An ERD typically shows the column names, their data types, and the relationships between the tables through the use of foreign keys. Figure 2 shows the ERD for the DBmail/PostgreSQL database.
Figure 2. Entity relationship diagram for the email database imap_mail
The easiest way to obtain the ERD is to use the
psql client to
read the relationships between the tables using the meta command
\d. Figure 3 shows a screenshot of the table description for
Figure 3. Using psql to obtain a table description
I need to work with only a few of the DBmail tables:
PostgreSQL provides you the ability to record all SQL statements via the
syslog daemon. To invoke it, adjust the data cluster's configuration file
postgresql.conf by inserting the following lines:
syslog = 2 syslog_facility = 'LOCAL0' syslog_ident = 'postgres' log_statement = true log_timestamp = true
The changes take effect once you've reloaded the service:
% /etc/init.d/postgresql reload
The DBmail error message levels are configured in the "dbmail.conf" file. Syslog places them in the mail log files ie:
You can read the syslog files using
% tail -n 100 /var/log/messages | grep postgres
The biggest challenge was understanding how DBmail worked; the rest was pretty easy.
Hack 1: Text Searching
This first hack searches all email for the text
roman. Once it finds a message that matches, the hack moves the
message from the INBOX to a subfolder named
The search begins when the user creates the subfolder
IMAP server recieves the instruction to create the folder from the client and
sends the appropriate SQL to the back end. The trigger
attached to the table
mailboxes, executes the function
f_v_1() when an INSERT attempts to create the folder
INBOX/v_1. The text search begins here. A SQL CASE statement
tests each email in the INBOX. When the function finds an email with the
desired text, it calls another function,
messages table with an UPDATE to the message's location from
tr_v_1 attaches the function
mailboxes table. The function will trigger on every INSERT
made to this table:
CREATE TRIGGER tr_v_1 AFTER INSERT ON mailboxes FOR EACH ROW EXECUTE PROCEDURE f_v_1();
f_v_1() starts by testing the new name of the new
IF NEW.name = ''INBOX/v_1'' THEN . . END IF;
If the name of the created folder is
INBOX/v_1), it performs a text search for the strings
I've used the PERFORM statement instead of SELECT because I'm interested in
the execution of the function
f_update_messages() rather than any
text string that the CASE statement may return. The CASE statement allows me to
test for the desired text string that's located in the
table by JOINing on the
PERFORM CASE WHEN messageblk ~* ''roman'' OR messageblk ~* ''rome'' THEN f_update_messages(NEW.mailbox_idnr,mblk.message_idnr) ELSE ''other'' END AS "search results" FROM messageblks mblk, messages m, mailboxes mb WHERE mblk.message_idnr = m.message_idnr AND m.mailbox_idnr = mb.mailbox_idnr;
This calls the function
f_update_messages, passing the mailbox
identity number and the messageblks identity number as parameters. It then
reassigns new identity numbers that effectively reassign the email's location
INBOX to the newly created subfolder
UPDATE messages SET mailbox_idnr=new_mailbox_idnr WHERE message_idnr=new_message_idnr;
Hack 2: Converting Word Attachments to Plain Text
This second hack converts all email with a Microsoft Word attachment to text and appends it to the original email.
This hack starts out the same way the first one does, as the attempt of the
DBmail server to INSERT data into the table
it. DBmail updates the email record located in this table when the client moves
email from one folder to another.
f_v_2 function performs multiple actions:
- Identifies whether a message has attachments
- Identifies whether the attachment is a Microsoft Word document
- Isolates and extracts the Word document from the email
- Converts it from its Base64 encoding to its binary form
- Converts the binary form to plain text
- Appends the plain-text transformation to the original email
The conversion operations use the command-line utilities
antiword, which execute within the
tr_v_2 attaches the function
to the table
messageblks. The function will execute for every
INSERT made to this table:
CREATE TRIGGER tr_v_2 BEFORE INSERT ON messageblks FOR EACH ROW EXECUTE PROCEDURE f_v_2();
f_v_2() starts by identifying the mailbox's
SELECT INTO mbx_name mailboxes.name FROM mailboxes, messages WHERE mailboxes.mailbox_idnr=messages.mailbox_idnr AND messages.message_idnr=NEW.message_idnr;
DBmail stores messages in two parts: the header and the message body.
This function identifies emails' Microsoft Word attachments while it moves them to
v_2. A positively identified message will fill in several
variables with information about all of the message's parts. This in essence
reconstructs the message:
SELECT INTO mail_header messageblk FROM messageblks WHERE messageblk_idnr=NEW.messageblk_idnr-1; IF mail_header ~ ''multipart/mixed'' AND NEW.messageblk ~ content_type THEN mime_boundary=trim(both ''"'' from substring(mail_header from ''".*"'')); NEW.messageblk = f_get_document(NEW.messageblk,mime_boundary); END IF;
f_get_document extracts the Microsoft Word MIME
CREATE OR REPLACE FUNCTION f_get_document(text,text) RETURNS TEXT AS' DECLARE mail_attachment ALIAS FOR $1; mime_boundary ALIAS FOR $2; mime_header TEXT; str TEXT;BEGIN mime_header = ''--''||mime_boundary||''\nContent-Type: text/plain; \ name=document.txt\n''; mime_header = mime_header||''Content-Transfer-Encoding: 7bit\n''; mime_header = mime_header||''Content-Disposition: inline; \ filename="document.txt"\n''; str=substring(mail_attachment from \ ''Content-Type: application/MS Word.*''||mime_boundary); str=substring(str from ''\\n\\n.*\\n\\n''); str=split_part(str,''\\n\\n'',2); RETURN mime_header||f_doc2txt(str)||''--''||mime_boundary||''--''; END; 'LANGUAGE plpgsql;
Notice that the function adds the header and MIME boundary to the Word
document, then turns the document into a text file by calling the function
f_doc2txt(). The Base64 encoding attachment,
the function's parameter:
f_doc2txt() is short. It performs minor
formatting to prepare the attachment for the uudecode utility which turns it
back into its binary form. It then pipes the binary stream into
antiword, which converts it into a text stream:
CREATE OR REPLACE FUNCTION f_doc2txt(text) RETURNS TEXT AS' use strict; my $str="begin-base64 600 zzz\n".$_."===="; $str= `echo "$str" | uudecode -o /dev/stdout | antiword - -i0`; return $str; ' LANGUAGE plperlu;
The function returns this where it replaces the Word attachment with a
plain-text translation in the
NEW.messageblk = f_get_document(NEW.messageblk,mime_boundary); . . . RETURN NEW;
Read the SQL source code at the same time you read this article.
The DBmail documentation can help you configure mail agents such as Postfix to pipe email directly into the database.
DBmail is changing rapidly, so refer to the home site for the latest news.
Don't be bashful! Go to the PostgreSQL site to learn more about triggers and the amazing number of currently supported programming languages.
There's so much you can do to get the most out of your email. By using triggers in your database engine, you can enhance any existing behavior into new and creative solutions. All it takes is a little imagination.
Here are a couple of ideas to whet your appetite:
- Sort and cross-index your email under several categories or searches, and place them in several folders at the same time.
- Translate email into another language by piping your correspondence through an online translation facility.
- Create and index your email, similar to Google, using PostgreSQL's tsearch advanced text-searching capabilities.
I began this article intending to highlight data-mining strategies, but as you can see it's hard to stick to this subject without also considering reverse-engineering techniques.
dbmailman pages, specifically:
man dbmail-adduser, which adds mail users to the Dbmail mail system
man dbmail-imapd, the imap4r1 front end for the Dbmail mail system
man dbmail-maintenance, which performs cleanup and integrity checks
man dbmail-pop3d, the POP3d front end for the Dbmail mail system
man dbmail-smtp, which inserts messages into the Dbmail mail system
- RFC 3501, the IMAP protocol
Robert Bernier is the PostgreSQL business intelligence analyst for SRA America, a subsidiary of Software Research America (SRA).
Return to ONLamp.com