ONLamp.com    
 Published on ONLamp.com (http://www.onlamp.com/)
 See this if you're having trouble printing code examples


LAMP and the Spread Toolkit

by Jason R. Briggs
11/30/2006

Strip away the veritable verbosity of voluminous framework verbiage (I've wanted to get that out of my system since watching V for Vendetta) from your average application server, and the remaining skeleton will be suspiciously similar to a standard Linux distribution running Apache and a few default programming languages. Look through the features you get for free with an app server, and you'll realize you get most of them for free with Linux/Apache as well--or at least a close approximation. Of course, you're reading ONLamp, so you already know this.

There is, however, one glaring omission. A feature of the app server stack doesn't seem to come by default in your average, open source LAMP bundle, and it's particularly important in many enterprise environments: messaging. To be more precise, it's the concept of enterprise messaging--the efficient and reliable routing of business-critical data from one server to another. Corporate environments might use messaging for multiple reasons: delivering data to multiple departments for processing, offloading CPU-intensive processing tasks to distributed servers, the modularization of enterprise components, etc. The words "robustness," "reliability," "scalability," and "flexibility" invariably come up when anyone is talking about messaging in the enterprise.

While enterprise messaging might not come by default, it is relatively easy to add this to a LAMP stack. One of the technologies to accomplish the task is the Spread toolkit.

Spread is an open source toolkit that satisfies the goals most people have in mind when they think about this kind of messaging. From the Spread home page:

Spread is an open source toolkit that provides a high performance messaging service that is resilient to faults across local and wide area networks. Spread functions as a unified message bus for distributed applications, and provides highly tuned application-level multicast, group communication, and point-to-point support. Spread services range from reliable messaging to fully ordered messages with delivery guarantees.

Spread has language bindings for Python, PHP, and Perl (not to mention C/C++, Java, Ruby, Squeak, Lua, Tcl, and others), and is available for a variety of platforms. It admirably fits most flexibility requirements. Spread can reliably deliver and order messages, allowing you to tailor Spread applications to meet the reliability and robustness needs of many corporate environments.

In a LAMP environment, you might want to use messaging to split your application load, without affecting the performance of specific components. For example, generating complex reports (perhaps using XSLT to transform XML to PDF, for example) may reduce your web server or database performance, so you might want to shift this processing onto a different server. You could connect those machines in a number of different ways: write data into a database and use a regular cron job to process them, call web services, or even make a direct socket connection between the two. Spread provides another alternative--a truly asynchronous request, potentially also allowing you to add more servers for scaling the load, without requiring complicated changes at the client (web server) end of the connection.

In addition, companies such as Zope Corporation use the Spread toolkit for its Zope Replication Services, for high availability. Indeed, while researching this article, one of the more common ideas I came across for the use of Spread was database replication. Early versions of replication for the Postgres database (the pgreplication project), for example, used Spread.

Installing Spread

The latest version of Spread, at time of writing, is Version 4, release candidate 2. As you might expect, third-party library support isn't quite so up to date. The stable release is Spread version 3.17.3. To install, extract the contents of the distribution, change into the directory created (spread-src-3.17.3), and then run the commands:

./configure
make
sudo su  (or just su to root)
make install

Among the apps installed as part of this process are:

Running with Daemons

To start up the Spread daemon, you need a configuration file. There's an example in /usr/local/etc/spread.conf. Basic spread configuration is straightforward, specifying a multicast address for a subnet and associated port, followed by a list of names and IP addresses:

Spread_Segment 192.168.0.255:4804 {
        machine1                192.168.0.3
        machine2                192.168.0.200
}

This example uses the broadcast address for my local network, runs on port 4804, and specifies two machines (machine1, machine2) along with their IP addresses to run in this segment. Once you've changed the configuration file to reflect your own environment (a single machine will work fine), set up a runtime directory and user/group for the daemon:

groupadd spread
useradd -g spread spread
mkdir /var/run/spread
chown spread:spread /var/run/spread/

Start the Spread daemon using: spread -c spread.conf -n machine1. Test that Spread functions correctly by running spuser -s 4804; then type j test to join the group test. You should see something like this:

Received REGULAR membership for group test with 1 members, where I am member 0:
        #user#machine1
grp id is -1062731773 1158449838 1
Due to the JOIN of #user#machine1

If you send a message to the test group, as a member you will also receive your message back:

User> s test
enter message: hello test group

User> 
============================
received SAFE message from #user#machine1, of type 1, (endian 0) to 1 groups 
(17 bytes): hello test group

There are a range of other configuration options for operating the Spread daemon (see the example in /usr/local/etc for more detail), including controlling the level of logging, but for the moment this simple configuration will work.

Programmatic Access

Now that you have the Spread daemon running, you can try to access it from code. The latest version of the Spread module for Python (1.5) is available from Zope, and older versions from the original Python Spread page. Download and extract the contents of the distribution (see Resources), then from the created directory, run:

python setup.py build
sudo python setup.py install

Test that the installation was successful by following the same steps as you did with spuser (join a group, send a message to the group):

Python 2.4.3 (#2, Apr 27 2006, 14:43:58) 
[GCC 4.0.3 (Ubuntu 4.0.3-1ubuntu5)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import spread
>>> c = spread.connect('4804')
>>> c.join('test')
>>> c.receive()
<MembershipMsg object at 0xb7d342c0>
>>> c.multicast(spread.RELIABLE_MESS, 'test', 'test message from python')
24
>>> msg = c.receive()
>>> msg.message
'test message from python'
>>> c.disconnect()

Note that you may get an error message when importing the spread module:

Traceback (most recent call last):
  File "<stdin>", line 1, in ?
ImportError: libtspread.so: cannot open shared object file: No such file or directory

If so, try setting your LD_LIBRARY_PATH to the location of the Spread library files. For example:

export LD_LIBRARY_PATH=/usr/local/lib

PHP is a tad more challenging, given that I was completely unsuccessful in getting the PHP Spread extension package (available from PECL) to compile, build, run, or even look at me slightly askance. If you're a PHP extension expert, you'll probably immediately see the problem and have the package installed in a few moments. For others, unable to find 15 sacrificial PHP virgins to dance the Rites of Extension Installation with, I've included a reworked version of the module in the Resources section of this article. Given that the last time I had to touch C code in anger (and it was in anger, I recall) was slightly more than a decade ago, please either ignore the rather embarrassing attempt at a Makefile or just snicker quietly from behind a small pot-plant somewhere. In addition, I've only tested it with PHP 5, so I'm keen to hear if anyone has any success on earlier versions of PHP (and what changes it needs to work).

To build and install this extension, you need to know the directory of the Spread include and library files, the location of includes for PHP, and the directory in which PHP expects its extensions. In the case of my Kubuntu machine, the Makefile variables look like:

INSTALL_TO              = /usr/lib/PHP5/20051025

SPREAD_INCLUDE          = /usr/local/include
SPREAD_LIB              = /usr/local/lib
PHP_INCLUDE             = /usr/include/PHP5

Compile and install the extension by running:

make
sudo make install

PHP also needs to know about the shared object, so find the PHP.ini for your distribution (mine is in /etc/PHP5/apache2, or /etc/PHP5/cli if you want to add the extension for command-line execution of PHP scripts). Add the line extension=spread.so. (Look for other references to extension if you can't find the extension directory.) Assuming a successful build and install, you can now try an integration test--sending a message from PHP to Python. Restart Apache httpd and try a PHP page:

<html>
<body>
<?PHP
$id = spread_connect('4804', 'PHPtest');

if ($id != null) {
        spread_join($id, 'test');
        $msg = spread_receive($id, 120000);
        echo "<p>received message " . $msg['message'] . "</p>";
        spread_leave($id, 'test');
        spread_disconnect($id);
}
else {
        echo "<p>Failed to connect</p>";
}
?> 
</body>
</html>

While the browser hangs, waiting for a message, open another console, and try:

Python 2.4.3 (#2, Apr 27 2006, 14:43:58) 
>>> import spread
>>> c = spread.connect('4804', 'mytest', 0, 0)
>>> c.multicast(spread.RELIABLE_MESS, 'test', 'hello there from python')

With any luck, the browser should display your message.

Free of Restriction

It should hopefully be obvious from the previous examples that Spread is refreshingly free of any restrictions, which means you can choose your own strategies for its use. Joining a group and sending a message corresponds with the publish/subscribe cycle of messaging, and point-to-point is available using the private name of a connection. For example, in one Python console type:

Python 2.4.3 (#2, Apr 27 2006, 14:43:58) 
>>> import spread
>>> c = spread.connect('4804', 'testname1', 0, 0)
>>> print c.receive().message

In another console:

Python 2.4.3 (#2, Apr 27 2006, 14:43:58) 
>>> import spread
>>> c = spread.connect('4804', 'testname2', 0, 0)
>>> c.multicast(spread.RELIABLE_MESS, '#testname1#machine1', 'this is a point-to-point message')

Where "#testname1#machine1" is the unique private name given to the first connection.

Defining a Basic Protocol

Where something like JMS specifies the format of a transmitted message, with Spread you are free to choose your own protocol. Because HTTP is such a well-known protocol, it makes sense, to me at least, to use a similar format for messaging. Thus I've decided to include headers at the beginning of a message, with the body containing whatever content I need to transmit. The first steps, then, are to define the libraries for creating and consuming messages in this format. You can then use these libraries to send an uploaded file from a PHP app to a Python app, which, for the simple purposes of this article, will just document the receipt.

For my Python applications, I use a custom Message class, defined in spreadutils.py, that is capable of both creating and parsing messages sent via Spread (once again, see the Resources section for the source to this and other code mentioned in this article). My test code is:

import spread
from spreadutils import *
c = spread.connect('4804', 'mytest', 0, 0)
msg = Message({ 'header1' : 'val1', 'header2' : 'val2' }, \
    'this is a test message with headers')
c.multicast(spread.RELIABLE_MESS, '#mytest#machine1', str(msg))

smsg = c.receive()
recmsg = Message(parse_msg=smsg.message)
print 'sent == received == %s' % (recmsg == msg)

This script sends a message back to the same connection (point-to-point) using the Message class both to create and then consume--and then to check the equality of sent and received messages. The PHP version of the class is only capable of message creation, as I'm not convinced it makes sense to receive messages in my PHP applications (despite the fact that the Spread extension for PHP allows for this behavior). The concept of sending an asynchronous message only to hang while awaiting a response doesn't sit well in my personal view of the world. For the moment, you can test sending a message from a PHP page, and receipt to a Python app using two scripts.

For Python (test2.py):

import spread
from spreadutils import *

c = spread.connect('4804', 'mytest', 0, 0)
c.join('testgroup')
smsg = c.receive()
recmsg = Message(parse_msg = smsg.message)
print str(recmsg)

For PHP (spreadtest.php.txt):

<?PHP
require_once('spreadutils.php');

$id = spread_connect('4804', 'PHPtest');

if ($id != null) {
    $msg = new Message();
    $msg->set_header('test1', 'test2');
    $msg->set_header('test2', 'test3');
    $msg->set_content('this is a test PHP message');

    spread_multicast('testgroup', $msg->str());
    spread_disconnect($id);
}
else {
    echo "<p>Failed to connect</p>";
}
?>

Run the Python script first--python test2.py--and then run the PHP script--php spreadtest.php. (Don't forget to start the Spread daemon first, if it's not already running.) The output from the Python script should be something like:

test1: test2
test2: test3

this is a test PHP message

Messaging Strategies

I've personally used messaging in a number of different environments. One that made particular sense was not to offload expensive processing; rather, its purpose was to provide a message highway between different departments within the company, often in geographically disparate locations. In some cases, a message might be delivered to one department, repackaged, and pushed back on the highway to the next department. In other cases, the same message went to more than one department for simultaneous processing. This corporation, in particular, was relatively technology-agnostic: a legacy Powerbuilder system might process one message, a C/C++ application might process another, Perl scripts might get some, and a Java app some others. (Over time, it seems very few IT environments have any choice but to be technology-agnostic, despite what the evangelists for any particular language-du-jour might think.) This idea is particularly attractive in a multi-server LAMP environment, given that certain types of processing make more sense in one language than another. As mentioned earlier, I don't believe it makes a lot of sense to receive messages in a PHP app (which is not to say in certain circumstances it might not be necessary, just that I'd prefer otherwise). So, from a design perspective in a multi-language environment, while I might send messages from a PHP application, I would potentially look at using Python daemons to handle writing the responses to those messages into a database; perhaps using Ajax polling for live notification to clients, or--the lightest-weight approach--including notifications in a standard page response.

Demonstration Application

For experimental purposes, I'm using a PHP application that is capable of receiving an uploaded file, and then transmitting the received data to a Python app for further processing. Think of a document management system, where you might upload a document of some kind (whether a POST using multipart/form-data, or using WebDAV to PUT the content), which then goes to another server for parsing, indexing, and storage. For example, the document processor could use antiword to convert a Word document to text--on its own, not a particularly CPU-intensive task--but then parse, index, and store the document data; which perhaps, may involve multiple database calls and writes, and therefore represent an unreasonable load on a client-facing server.

For simplicity's sake, in this case the Python application does no more than convert a Word document to text, log the receipt, and return a response. There are three components:

  1. A PHP page (upload.php.txt) containing a file dialog and a submit button, which sends a Spread message containing uploaded data.
  2. A Python application (uploadp.py), located on another server, which will log the message and return an acknowledgement containing the size of the data, and the number of lines after antiword has converted the document.
  3. A second Python application on the PHP server (uploadr.py), which records the acknowledgement in the database.

Place upload.php in an Apache-accessible directory, start the Spread daemon on each server, and then start the Python scripts (running: python uploadp.py and python uploadr.py). The scripts are fairly similar. The main difference between them is that uploadr.py maintains a connection to the database to write responses. upload.php saves the filename of the upload to the database before sending the file on to the uploadfiles group.

Thus uploadp.py connects to the daemon, joins the group uploadfiles, logs converted data to stdout, and then returns the size of the message content (and number of lines) as a message response:

import popen2
import spread
from spreadutils import *
import tempfile

c = spread.connect('4804', 'uploadp', 0, 0)
c.join('uploadfiles')

while True:
    smsg = c.receive()
    print 'received %s ' % smsg
    recmsg = Message(parse_msg = smsg.message)
    print 'Received message with id #%s' % recmsg['f_id']
    
    fname = tempfile.mktemp()
    fout = file(fname, 'w')
    fout.write(recmsg.content)
    fout.close()    
    r, w, e = popen2.popen3('antiword %s' % fname)
    text = r.readlines()
    for line in text:
        print line,

    resmsg = Message({ 'f_id' : recmsg['f_id'], 'size' : len(recmsg.content), 'lines' : len(text) })
    c.multicast(spread.RELIABLE_MESS, 'uploadresponses', str(resmsg))

uploadr.py (which receives a response back from uploadp.py), connects to the database as well as to the daemon, and writes the file size and number of lines to the database:

import MySQLdb
import spread
import sys
from spreadutils import *

db = MySQLdb.connect(host = "localhost",
                     user = "spread",
                     passwd = "password",
                     db = "spreaddb")
cursor = db.cursor()
c = spread.connect('4804', 'uploadp', 0, 0)
c.join('uploadresponses')

while True:
    smsg = c.receive()
    recmsg = Message(parse_msg = smsg.message)

    try:
        f_id = recmsg['f_id']
        size = recmsg['size']
        lines = recmsg['lines']
        print 'Received message response with id #%s' % f_id
        cursor.execute("update uploaded_files set size = %s, num_lines = %s where id = %s" % ( size, lines, f_id ))
        db.commit()
    except:
        print 'invalid message %s' % recmsg.debug()
        sys.exit(1)

I've chosen this approach to mimic the idea of a distributed server where, perhaps for load reasons, you may not want to connect to a database. uploadp.py could run on a large number of (potentially) geographically remote servers, performing the CPU-intensive processing. uploadr.py merely writes response data to the database.

The idea of distributed servers brings me to my final topic.

Advanced Usage

Some of the nice-to-haves (indeed, in some cases, essentials), that you don't get for free with Spread, are things like store-and-forward, "durable" messages, distributed queues, and some transactional, storage, and logging facilities that come built into other solutions. It's important to bear this in mind if you're going to use Spread in an enterprise, as some of these features are likely to be on a non-functional requirements list (somewhere) for your project. While providing the ability to participate in distributed transactions, handle contention between message processors, and store/reprocess messages, is not a massively complicated task, it still represents a potential amount of development effort, not to mention the ongoing support costs. From this point of view, be realistic about rolling Spread into a large-scale project--you don't get all of the features of a Tibco Rendezvous out of the box.

Conclusion

The Spread toolkit is a powerful mechanism for handling the transmission of messages between application components that might be within a local domain or geographically separate. The daemon is mature, stable, performant, and best of all, straightforward to install and to use. There are a variety of tools and third-party libraries for transmitting messages over Spread; hence, there is complete flexibility in how you bolt application components together.

That said, the PHP extension for Spread is, admittedly, not production-ready. There are some stability issues; in particular, if the Spread daemon restarts after the PHP extension has made a connection, a reconnection can cause a persistent crash (not Apache httpd, just the extension itself). Therefore, I would invest in some serious development time with a C-and-PHP guru before relying on the extension for a mission-critical system. On the other hand, I would feel confident using the Python module in my applications, so a short-term solution for any PHP problems might be closer integration between Python and PHP components.

Whether you're putting together distributed systems or connecting different departments, it's perhaps worthwhile looking at LAMPS: Linux Apache MySql Python/Perl/PHP--and Spread.

Resources

Jason R. Briggs is an architect/developer working in the United Kingdom.


Return to ONLamp.com.

Copyright © 2009 O'Reilly Media, Inc.