ONLamp.com
oreilly.comSafari Books Online.Conferences.

advertisement


LAMP and the Spread Toolkit
Pages: 1, 2, 3

Messaging Strategies

I've personally used messaging in a number of different environments. One that made particular sense was not to offload expensive processing; rather, its purpose was to provide a message highway between different departments within the company, often in geographically disparate locations. In some cases, a message might be delivered to one department, repackaged, and pushed back on the highway to the next department. In other cases, the same message went to more than one department for simultaneous processing. This corporation, in particular, was relatively technology-agnostic: a legacy Powerbuilder system might process one message, a C/C++ application might process another, Perl scripts might get some, and a Java app some others. (Over time, it seems very few IT environments have any choice but to be technology-agnostic, despite what the evangelists for any particular language-du-jour might think.) This idea is particularly attractive in a multi-server LAMP environment, given that certain types of processing make more sense in one language than another. As mentioned earlier, I don't believe it makes a lot of sense to receive messages in a PHP app (which is not to say in certain circumstances it might not be necessary, just that I'd prefer otherwise). So, from a design perspective in a multi-language environment, while I might send messages from a PHP application, I would potentially look at using Python daemons to handle writing the responses to those messages into a database; perhaps using Ajax polling for live notification to clients, or--the lightest-weight approach--including notifications in a standard page response.



Demonstration Application

For experimental purposes, I'm using a PHP application that is capable of receiving an uploaded file, and then transmitting the received data to a Python app for further processing. Think of a document management system, where you might upload a document of some kind (whether a POST using multipart/form-data, or using WebDAV to PUT the content), which then goes to another server for parsing, indexing, and storage. For example, the document processor could use antiword to convert a Word document to text--on its own, not a particularly CPU-intensive task--but then parse, index, and store the document data; which perhaps, may involve multiple database calls and writes, and therefore represent an unreasonable load on a client-facing server.

For simplicity's sake, in this case the Python application does no more than convert a Word document to text, log the receipt, and return a response. There are three components:

  1. A PHP page (upload.php.txt) containing a file dialog and a submit button, which sends a Spread message containing uploaded data.
  2. A Python application (uploadp.py), located on another server, which will log the message and return an acknowledgement containing the size of the data, and the number of lines after antiword has converted the document.
  3. A second Python application on the PHP server (uploadr.py), which records the acknowledgement in the database.

Place upload.php in an Apache-accessible directory, start the Spread daemon on each server, and then start the Python scripts (running: python uploadp.py and python uploadr.py). The scripts are fairly similar. The main difference between them is that uploadr.py maintains a connection to the database to write responses. upload.php saves the filename of the upload to the database before sending the file on to the uploadfiles group.

Thus uploadp.py connects to the daemon, joins the group uploadfiles, logs converted data to stdout, and then returns the size of the message content (and number of lines) as a message response:

import popen2
import spread
from spreadutils import *
import tempfile

c = spread.connect('4804', 'uploadp', 0, 0)
c.join('uploadfiles')

while True:
    smsg = c.receive()
    print 'received %s ' % smsg
    recmsg = Message(parse_msg = smsg.message)
    print 'Received message with id #%s' % recmsg['f_id']
    
    fname = tempfile.mktemp()
    fout = file(fname, 'w')
    fout.write(recmsg.content)
    fout.close()    
    r, w, e = popen2.popen3('antiword %s' % fname)
    text = r.readlines()
    for line in text:
        print line,

    resmsg = Message({ 'f_id' : recmsg['f_id'], 'size' : len(recmsg.content), 'lines' : len(text) })
    c.multicast(spread.RELIABLE_MESS, 'uploadresponses', str(resmsg))

uploadr.py (which receives a response back from uploadp.py), connects to the database as well as to the daemon, and writes the file size and number of lines to the database:

import MySQLdb
import spread
import sys
from spreadutils import *

db = MySQLdb.connect(host = "localhost",
                     user = "spread",
                     passwd = "password",
                     db = "spreaddb")
cursor = db.cursor()
c = spread.connect('4804', 'uploadp', 0, 0)
c.join('uploadresponses')

while True:
    smsg = c.receive()
    recmsg = Message(parse_msg = smsg.message)

    try:
        f_id = recmsg['f_id']
        size = recmsg['size']
        lines = recmsg['lines']
        print 'Received message response with id #%s' % f_id
        cursor.execute("update uploaded_files set size = %s, num_lines = %s where id = %s" % ( size, lines, f_id ))
        db.commit()
    except:
        print 'invalid message %s' % recmsg.debug()
        sys.exit(1)

I've chosen this approach to mimic the idea of a distributed server where, perhaps for load reasons, you may not want to connect to a database. uploadp.py could run on a large number of (potentially) geographically remote servers, performing the CPU-intensive processing. uploadr.py merely writes response data to the database.

The idea of distributed servers brings me to my final topic.

Advanced Usage

Some of the nice-to-haves (indeed, in some cases, essentials), that you don't get for free with Spread, are things like store-and-forward, "durable" messages, distributed queues, and some transactional, storage, and logging facilities that come built into other solutions. It's important to bear this in mind if you're going to use Spread in an enterprise, as some of these features are likely to be on a non-functional requirements list (somewhere) for your project. While providing the ability to participate in distributed transactions, handle contention between message processors, and store/reprocess messages, is not a massively complicated task, it still represents a potential amount of development effort, not to mention the ongoing support costs. From this point of view, be realistic about rolling Spread into a large-scale project--you don't get all of the features of a Tibco Rendezvous out of the box.

Conclusion

The Spread toolkit is a powerful mechanism for handling the transmission of messages between application components that might be within a local domain or geographically separate. The daemon is mature, stable, performant, and best of all, straightforward to install and to use. There are a variety of tools and third-party libraries for transmitting messages over Spread; hence, there is complete flexibility in how you bolt application components together.

That said, the PHP extension for Spread is, admittedly, not production-ready. There are some stability issues; in particular, if the Spread daemon restarts after the PHP extension has made a connection, a reconnection can cause a persistent crash (not Apache httpd, just the extension itself). Therefore, I would invest in some serious development time with a C-and-PHP guru before relying on the extension for a mission-critical system. On the other hand, I would feel confident using the Python module in my applications, so a short-term solution for any PHP problems might be closer integration between Python and PHP components.

Whether you're putting together distributed systems or connecting different departments, it's perhaps worthwhile looking at LAMPS: Linux Apache MySql Python/Perl/PHP--and Spread.

Resources

Jason R. Briggs is an architect/developer working in the United Kingdom.


Return to ONLamp.com.



Sponsored by: