Introducing mod_pythonby Gregory Trubetskoy
Back in January I was asked to write an article about mod_python. I thought it was a great idea, but the realities of life have kept me from being able to devote the necessary effort to writing about mod_python, as all of my free time was devoted to writing mod_python itself.
Finally, I had a spare minute or two, and the main question seemed to be how to introduce mod_python. I've decided to focus on a high level overview of mod_python's features and benefits.
What is Mod_python?
Mod_python is actually several things:
- a loadable Apache module which embeds the Python interpreter
libpython), thereby providing the ability to execute Python code in-process by Apache.
- a handler of Apache's request processing phases, allowing for any phase of the request to be implemented in Python. It also allows filters and connection handlers to be implemented in Python.
- an interface to a subset of the Apache API, allowing the ability to call internal Apache functions from Python. This provides access to internal server information and ability to take advantage of Apache server facilities such as logging.
- a collection of tools for developing web applications. It provides a set of standard handlers: Publisher, PSP, and CGI each providing an alternative development framework, as well as a set of utility objects and functions for cookie processing, session management, and other things common in web development.
Let's look individually at these benefits.
Python interpreter embedded within Apache
Aside from mod_python, there are two ways of executing Python code by Apache. The first is through the CGI model, where a separate process is spawned and its standard output is redirected to the browser. The second is by proxying request information to another server sitting behind Apache by various methods such as FastCGI or proprietary protocols.
CGI, although the most universally supported interface, is by far the least efficient. Every CGI request causes a process to be created and the Python interpreter loaded and initialized, which results in unnecessary CPU and I/O activity.
The second approach of handing the request to a separate application server (typically written in Python) is considerably more efficient than CGI and is used by a number of popular Python Web development frameworks (Webware, for example). It is also very common in the Java world. One shortcoming of this approach is the added complexity. You have to configure and run an additional server besides Apache, which at least doubles the system administration overhead and complicates debugging and troubleshooting. Another shortcoming is that it misses out on the amazing scalability and efficiency of Apache, since the requests are being serviced only as quickly as the backend server can process them.
Mod_python addresses both of these limitations by embedding the Python interpreter within Apache. Python code executes directly within the Apache server, eliminating any need to spawn external processes or run additional servers. At the same time, mod_python takes advantage of Apache's highly optimized ability to accept and process incoming requests. The result is a platform that can process requests faster than any other currently available web application framework for Python.
As a side note, there exists a commonly expressed opinion that, given the current speed of hardware, there is no need to devote too much attention to optimizing performance from software. Even CGI is fast enough on my multi-gigahertz machine, so why bother with mod_python? Any site can unexpectedly become of interest to very large groups of people, often rendering the site inaccessible due to overload (the infamous slashdot effect). Because of the continuous growth of the Internet along with ever increasing richness of content, even the fastest servers available today are often not able to keep up with demand. Therefore any reasonable effort to leverage maximum performance of your hardware is well worth it.
Ability to handle request phases, filters and connections
The ability to handle request phases, filters, and connections is a very
Apache-specific feature, but the general concept of phased processing is not
uncommon. If you have had to program Java Servlets, you will find Apache
request phases very much like the Servlet interface methods. Each Apache
request is processed in phases, and modules have an opportunity to register
their own handler functions for each phase. As an example,
mod_auth_dbm registers for the authentication phase, thereby
providing the ability to use dbm files to store passwords.
Mod_python provides the ability to register for any phase and write the processing function in Python. This is a very powerful feature, because it opens the door for many innovative and exciting ways to use Apache. For example, you can write Python code to do authentication processing or custom logging (perhaps sending logs to a database while maintaining real-time statistics).
In addition to phases, mod_python also allows for filters to be implemented in Python. Filters receive output from or input to the server and have an opportunity to alter it. Filters can also be stacked, so that output from one filter is processed by the next filter. A clever filter could automatically detect stock ticker symbols and replace them with a link to a site providing stock information.
Lastly, mod_python allows you to create connection handlers. A connection handler handles the connection from the moment it is received, allowing you to bypass Apache's HTTP handling. Connection handlers can be used to implement servers for protocols other than HTTP. Although this feature is rarely used, it is quite powerful for custom protocol implementation or prototyping where performance is important. Many years of work have gone into development of Apache's code to handle efficiently large numbers of incoming connections. A mod_python connection handler leverages this efficiency as well as Apache server facilities such as logging, allowing the developer to focus on the protocol handling.
Interface to Apache API
The interface to Apache API can be used to retrieve various kinds of server information and use internal server facilities. Available server information includes typical data available to CGI programs as well as various interesting bits such as number of bytes sent, server document root directory, the phase being processed, the file handler of the file being requested, and more. Additionally, Apache-specific information is available such as the configuration tree, the MPM being used, and MPM parameters such as the maximum number of processes and threads.
Aside from getting information, the exposed portion of API can be used to do take some very powerful actions such as dynamically register additional mod_python handlers, create internal redirects, and register cleanups to be executed after the request is finished or before the server is shut down.
Web Development Toolset
Early versions of mod_python provided few tools for web development. This made it difficult for developers who were not interested in delving into Apache internals, yet who wanted an efficient and scalable platform for developing applications in Python. Although mod_python included the Publisher handler, which implements a clever way of mapping URI paths to functions and objects within Python modules (originally inspired by Zope's Zpublisher, back then known as Bobo), it did not provide a native way of dealing with cookies and sessions. The latter proved to be a particularly complex issue due to Apache's multi-process architecture, which makes sharing data and locking between processes rather challenging for an average programmer.
Luckily, this has changed in the upcoming (as of the time of writing)
version 3.1. This release introduces native handling of cookies, including
support for cryptographic signing of cookie data (using HMAC), as well as the
ability to marshal (serialize) simple objects into cookie values. There is
support for session management with fairly thorough random session id
generation logic and the ability to take advantage of signed
cookies. Sessions can be stored in either a dbm file or directly in memory
depending on whether Apache runs in multi-process or threaded mode. Sessions
support session locks using Apache's internal global mutex interface to provide
mutual exclusion across all processes and threads. The
class is extensible, so that it is easy to implement custom session objects
which use alternative persistent storage, such as a relational database.
Last, but not the least, version 3.1 introduces mod_python's own
implementation of PSP (Python Server Pages). This is a framework that allows
embedding Python code within HTML similar to the way it is done in PHP, JSP, or
ASP. The core parser implementation for mod_python PSP was initially written
and graciously contributed to mod_python by Sterling Hughes, a core PHP
developer. The PSP parser is generated using
, one of the
fastest scanning and parsing tools in existence. It also integrates nicely with
other tools provided by mod_python such as session handling, altogether
resulting in one of the fastest Server Page implementations available.
The only real constraint of mod_python is that it is Apache-specific. Mod_python applications are not easily portable to other web server platforms. In my opinion this is not a concern, because Apache is already supported by a larger number of operating systems than any other web server and is by far the most popular web server in the world, and still gaining market share. By committing to Apache as an integral part of your application, you gain an amazing amount of performance and versatility at the expense of not having the ability to use a (less capable) web server down the road. I think this is a no-brainer.
Mod_python is an ASF project
Finally, I'd like to mention that shortly before release of version 3.0, mod_python was adopted by Apache Software Foundation and became an official subproject of the Apache HTTP Server project. This gave mod_python a wider recognition and the necessary foundation of sound development practices and peer review of a very talented team of developers. Mod_python's popularity is steadily growing as more and more developers recognize it a sound and stable platform for Web application development. As an example, Red Hat 9 now has mod_python enabled by default.
Mod_python is a fast, versatile and scalable way to develop Web applications in Python. If you're convinced, give it a try.
Gregory Trubetskoy is the lead developer of mod_python and a member of the Apache Software Foundation.
Return to the Python DevCenter.