What is WSGI, and why should you care? WSGI, or the Web Server Gateway Interface, is a Python Enhancement Proposal (#333) authored by Philip Eby in an attempt to address the lack of standardization among Python web frameworks. Think of it as the servlet spec for the Python world, only simpler. Although the WSGI specification is primarily aimed at framework developers, you can also develop application components to it. One of the aims of PEP 333 is ease of implementation, which consequently carries through to the development of those components. For example, the "Hello world" example given in the WSGI specification could hardly be any simpler to implement:
def simple_app(environ, start_response): status = '200 OK' response_headers = [('Content-type','text/plain')] start_response(status, response_headers) return ['Hello world!\n']
With the recent release of Python 2.5, the reference implementation of WSGI is available as a standard module (
wsgiref), meaning that there is a direct path from developing application components to test and production environments. It's easy to imagine rapid development of components using
wsgiref, then using something more scalable for testing and production. That said, WSGI doesn't provide many features you might expect or want for webapp development; for sessions, cookies, and the like, you'd also need a suitable web framework (of which there are many), or perhaps middleware or utilities to provide those services.
As to why you should care: if you're a low-level person who prefers to bolt on utilities and modules to keep your development effort as free of constraint as possible, WSGI will hold considerable attraction. Even if you prefer the benefits offered by higher-level frameworks, chances are they'll be built on top of WSGI, and it's always useful to know what happens behind the scenes.
For those (like myself) running Python 2.4.x, the good news is that the
wsgiref module will still function.
wsgiref is available from the Python Subversion repository, or you can download it from the command line via:
svn co http://svn.python.org/projects/python/trunk/Lib/wsgiref
Copy the wsgiref directory into the site-packages directory of your Python distribution (in my case, /usr/lib/python2.4/site-packages/) and check whether you can import the module. If so, you should be able to type
import wsgiref in the Python console with no errors reported.
Testing the "Hello world" application shown earlier requires a few extra lines of code (see test1.py):
if __name__ == '__main__': from wsgiref import simple_server httpd = simple_server.make_server('', 8080, simple_app) try: httpd.serve_forever() except KeyboardInterrupt: pass
simple_server (an implementation of the
BaseHttpServer module) to provide basic web server facilities, and passes the name of the
simple_app function as an argument to the
make_server function. Run this program (
python test1.py) and direct your browser to http://localhost:8080 to see it in action.
You don't have to stick to simple functions for your applications--WSGI supports object instantiation for handling requests. To do so, create a class that implements the
__iter__ methods. For example, I've abstracted out some basic utilities in the following class. The
__iter__ method checks for a
do_ method matching the type of HTTP request (GET, PUT, etc.) and either calls that method to process, or sends an HTTP 405 in response. In addition, I've added a
parse_fields method to parse the
x-url-form-encoded parameters in the body of a request using the standard
cgi module. Note that, for both object instantiation and simple method calls, the arguments (
start_response) are positional--the order is important, not the name.
import cgi class BaseWSGI: def __init__(self, environ, start_response): self.environ = environ self.start = start_response def __iter__(self): method = 'do_%s' % self.environ['REQUEST_METHOD'] if not hasattr(self, method): status = '405 Method Not Allowed' response_headers = [('Content-type','text/plain')] self.start(status, response_headers) yield 'Method Not Allowed' else: m = getattr(self, method) yield m() def parse_fields(self): s = self.environ['wsgi.input'].read(int(self.environ['CONTENT_LENGTH'])) return cgi.parse_qs(s)
I can then subclass
BaseWSGI to create a simple number-guessing application (test2.py):
import random number = random.randint(1,100) class Test(BaseWSGI): def __init__(self, environ, start_response): BaseWSGI.__init__(self, environ, start_response) self.message = '' def do_GET(self): status = '200 OK' response_headers = [('Content-type','text/html')] self.start(status, response_headers) return ''' <html> <body> <form method="POST"> <p>%s</p> <p><input type="text" name="myparam" value="" /> <p><input type="submit" /></p> </form> </body> </html> ''' % self.message def do_POST(self): global number fields = self.parse_fields() if not fields.has_key('myparam'): self.message = 'You didn't guess' return self.do_GET() guess = int(fields['myparam']) if guess == number: self.message = 'You guessed correctly' number = random.randint(1,100) elif guess < number: self.message = 'Try again, the number is higher than your guess' else: self.message = 'Try again, the number is lower than your guess' return self.do_GET()
You may be thinking that all of this is somewhat like reinventing the wheel--which is true, to a point. However, the low-level nature of WSGI is designed to make implementing frameworks a straightforward process--and more standardized. If you don't want to reinvent the wheel from an application perspective, look to a higher-level web framework, but do read on for some alternatives.
To extend these simple examples into something a little more realistic, I'll implement an extremely basic blogging application along RESTful lines: using HTTP GET to retrieve a single entry or a list of entries, PUT to add or update an entry, and DELETE to remove one.
The first step is to extend the
BaseWSGI class slightly to handle GET requests in one of two ways:
GET / should return a list of all entries, while
GET [name] should return a named entry. To provide this, I've added code to the
__iter__ method so that when the path requested is
/, the text
ALL gets appended to the method (meaning a subclass now needs to implement both
if request_method == 'GET' and self.environ['PATH_INFO'] == '/': method = method + 'ALL'
At this point, I've decided to store the weblog entries as plain-text files, with nothing in the way of metadata for ordering or filtering. Obviously, in a real application you'd want to be able to search for entries based on particular criteria--perhaps by exposing more meaningful or useful resource URLs (for example, something like
/2006/08/my-entry-name)--but for the purposes of this basic application, file-system storage will suffice. Thus, data access for a blog entry is as simple as:
class Entry: def __init__(self, path, filename, load=True): self.filename = os.path.join(path, filename.replace('+', '-')) + '.txt' self.title = filename.replace('-', ' ') if load and os.path.exists(self.filename): self.text = file(self.filename).read() def save(self): f = file(self.filename, 'w') f.write(self.text) f.close()
Presenting entries needs some kind of templating. Python has an abundance of choices, such as Cheetah, Kid, and Myghty, not to mention numerous others bundled with the various frameworks. To keep things simple, I'm using a homegrown templating engine that simply injects dynamic content based on the IDs in an XML document. (Given the constraint that all IDs must be unique, this is probably the simplest approach to templating XML, at least from a usage perspective.) Thus, the
do_GET method of my application becomes:
def do_GET(self): pathinfo = self.environ['PATH_INFO'][1:] entry = Entry(blogdir, pathinfo) if entry.text: (ext, content_type) = self.get_type() response_headers = [('Content-type', content_type)] if self.status_override: status = self.status_override else: status = '200 OK' self.start(status, response_headers) tmp = self.engine.load('blog-single.' + ext) tmp['entry:title'] = entry.title tmp['entry:text'] = entry.text tmp['entry:link'] = template3.Element(None, href='http://localhost:8080/%s?type=%s' % (entry.title.replace(' ', '-'), ext)) return str(tmp) else: self.start('404 Not Found', [('Content-type', 'text/html')]) return '%s not found' % pathinfo
PATHINFO HTTP variable provided by
wsgi, I load an entry, then check to see if the text exists; if not, the blog file was not present, so I return a standard
404 Not Found. If the entry loaded successfully, the
get_type() method returns the extension to use for the template (and the content type) based on a
type parameter passed in the URL. I create the response headers (just content type, for the moment), and start the response process by calling
self.start. At this point I've also checked for the presence of
status_override, which is a field used when another method calls
do_GET (see the
do_PUT method later). Finally, I set the content in the template using the IDs:
entry:link. (I'll return to the
do_GETALL method shortly.)
The most important method from the WSGI perspective is
start. It takes a response code and message, as well as the response headers as a list of tuples. I assigned it from the
start_response positional parameter in BaseWSGI.
Creating a blog entry calls the
do_PUT method, which performs several steps:
pathinfoand for a
content-lengthgreater than 0.
Entryobject, using the
Entrydoes not contain text, then this is a new blog post, so set the status override variable with "201 Created."
do_GETmethod to return something meaningful to the caller.
def do_PUT(self): pathinfo = self.environ['PATH_INFO'][1:] if pathinfo == '': self.start('400 Bad Request', [('Content-type', 'text/html')]) return 'Missing path name' elif not self.environ.has_key('CONTENT_LENGTH') or self.environ['CONTENT_LENGTH'] == '' \ or self.environ['CONTENT_LENGTH'] == '0': self.start('411 Length Required', [('Content-type', 'text/html')]) return 'Missing content' entry = Entry(blogdir, pathinfo) if not entry.text: self.status_override = '201 Created' entry.text = self.environ['wsgi.input'].read(int(self.environ['CONTENT_LENGTH'])) entry.save() return self.do_GET()
For a DELETE, I just do the basics: check to see if the entry exists, delete and return a 204 Deleted:
def do_DELETE(self): pathinfo = self.environ['PATH_INFO'][1:] blogfile = os.path.join(blogdir, pathinfo.replace('+', '-')) + '.txt' if os.path.exists(blogfile): os.remove(blogfile) self.start('204 Deleted', [ ]) return 'Deleted %s' % pathinfo else: self.start('404 Not Found', [('Content-type', 'text/html')]) return '%s not found' % pathinfo
do_GETALL method, which is the only one of the subclass methods that doesn't actually correspond to an HTTP verb, is also the only one that differs from the validation+response cycle established by the other methods.
do_GETALL will always return 200 OK, and will read in all .txt files in the specified blog directory, reusing the
blog-single template (used in the
do_GET method). The main differences between this method and
do_GET revolve around templating (and are not particularly relevant to WSGI).
If I were creating a typical GET/POST web application, testing would be straightforward: use a browser. Because I've used REST semantics, I need to use another tool--in this case, Curl--to test all my application's features. The first step is to start up the blog using
python blog.py, and then:
curl -v -X PUT http://localhost:8080/test1 -d @-will add an entry with the title "test1" (
-d @-takes input from STDIN-- hit
Ctrl+D to stop).
curl -v -X PUT http://localhost:8080/test1 -d @-will update that entry. (Notice that the 201 return code should change to a 200).
curl -v http://localhost:8080/will return a list of all entries.
curl -v -X DELETE http://localhost:8080/test1will delete the entry previously created.
I've included three template types: .xhtml for HTML viewing, .xml for simple XML output, and .atom to produce an Atom feed. Test these different templates by calling:
curl -v http://localhost:8080/?type=xml
curl -v http://localhost:8080/?type=xhtml
curl -v http://localhost:8080/?type=atom
So far I've only demonstrated how to set up a basic, stateless application by extending the foundations provided by WSGI. If you're thinking about larger-scale web application development, the recommended approach is undoubtedly to choose a suitable framework. This is not to say that developing such a webapp is impossible using basic WSGI, but you'll need to add (by hand) a lot of the technology that you get for free with a framework--either by writing your own, or plugging in third-party middleware.
The WSGI perspective on middleware is an important part of the specification. Adding middleware involves wrapping layers of utility code around a base app to provide additional functionality; the PEP calls this a middleware stack. For example, to provide authentication facilities, you might wrap your application with
BasicAuthenticationMiddleware; to compress responses, you might wrap it with another middleware component called
CompressionMiddleware; and so on.
The Python Paste project provides WSGI middleware and various other useful utilities. As an example of how powerful the concept of middleware is, consider the use of Paste's
SessionMiddleware (see test3.py for more details):
from paste.session import SessionMiddleware class myapp2: def __init__(self, environ, start_response): self.environ = environ self.start = start_response def __iter__(self): session = self.environ['paste.session.factory']() if 'count' in session: count = session['count'] else: count = 1 session['count'] = count + 1 self.start('200 OK', [('Content-type','text/plain')]) yield 'You have been here %d times!\n' % count app2 = SessionMiddleware(myapp2)
In this example,
myapp2. When a request comes in, SessionMiddleware adds the session factory to the
environ with the key
paste.session.factory, and when invoked in the first line of the
__iter__ method, the session is returned as a simple
dict. A stack of middleware components added to a basic WSGI application means you can have the benefits provided by many of the frameworks, without necessarily having to constrain yourself to a framework.
I've shown how to run web applications within the simple environment provided by
wsgiref, but what about launching something on a live site? The WSGI wiki lists multiple servers that support WSGI, including (but not limited to) CherryPy, python-fastcgi, and Paste; chances are, if you're using a framework, your production choices will be very easy.
I've decided to use one of the simpler approaches: mod_python coupled with a modified version of
wsgi_handler.py. Nicolas Borko wrote this script based on his reading of the PEP. It allows you to publish WSGI applications under Apache easily.
Consult the mod_python documentation for help installing mod_python in your environment, but certainly in the case of K/Ubuntu, the process is straightforward:
$ apt-get install libapache2-mod-python $ cd /etc/apache2/mods-enabled $ ln -s ../mods-available/mod_python.load
The handler needs to be accessible by mod_python before you go any further. You have two choices: either append the location of wsgi_handler.py to the Python path, or copy the file into site-packages (again, mine is /usr/lib/python2.4/site-packages/). For the moment, I've opted to copy. Once
wsgi_handler is in place, create a configuration file (mod_python.conf) in the directory /etc/apache2/mods-enabled (or the location of module configuration files for your Apache setup) and insert at least some basic configuration:
<Directory /var/www/test> PythonHandler wsgi_handler PythonOption WSGI.Application test1::simple_app AddHandler python-program .py PythonPath "sys.path + [ '/var/python' ]" </Directory>
This configuration directs any requests to the test directory of my webroot (/var/www) with a .py extension to the
wsgi_handler. The WSGI application I want to run is once again in the script test1.py (with the function
simple_app). I've placed this file in /var/python (and the configuration adds this directory to the Python path). Restart Apache httpd and, with any luck, you'll be able to browse to http://localhost/test/test.py.
My slightly modified version of
wsgi_handler provides the ability to specify just a script in the mod_python configuration, rather than a script and function. This allows a more powerful configuration:
<Location "/test/foo"> PythonHandler wsgi_handler PythonOption WSGI.Application test1 SetHandler python-program PythonPath "sys.path + [ '/var/python' ]" </Location>
Rather than a directory, this setting configures a location relative to the web server root. I've used
SetHandler, which does not require the file extension. Also, the
PythonOption now includes only a reference to the test1.py script. If you add these directives to your mod_python configuration file, you can use the URL http://localhost/test/foo/simple_app, which means you'll now be able to add more than one WSGI application to the script. Whether this is a good idea in production code is debatable, but it's certainly useful for development.
WSGI exposes one of the simplest APIs I've seen in a while, and I believe that very simplicity underlies its power. As a framework or utility developer, the middleware concept is an attractive approach to layering features without having to bolt everything in at the lowest level, while an application developer keen on "keeping it simple, stupid" can work with an extremely basic interface. With a growing number of the higher-level frameworks supporting WSGI, and with the addition of the
wsgiref module to Python 2.5, you can easily roll WSGI into your own projects--you may even be using it already without knowing it. Hopefully this article has pointed you in a few directions for further reading and experimentation of your own.
A couple of the better-known frameworks are (in no particular order):
Jason R. Briggs is an architect/developer working in the United Kingdom.
Return to the Python DevCenter.
Copyright © 2009 O'Reilly Media, Inc.