Rethinking the Linux Distributionby George Belotsky
This article ties together a number of exciting ideas in the Free/Open Source (FOSS) community, to suggest a new direction for the Linux distribution. Many of these ideas are also applicable to BSD-based systems.
Although there are several mature, high quality distributions available, Linux has had a very hard time breaking through in certain markets, such as the desktop. In addition, the internet, which has already dramatically transformed the environment for other content-creating industries, may now alter the established methods for software packaging and installation.
The activities around Web 2.0 are giving rise to Software as a Service (SaaS). For example, Google claims that more than 100,000 small businesses, as well as a few large ones, have signed up for Google Apps. Of course, Microsoft is trying to build its own SaaS offering, Windows Live. Meanwhile, many categories of Web applications are already mature, including email, social networking, and e-commerce. The next step is the Web OS; a race that, in the end, may go to a startup company.
As I hope to demonstrate in this article, FOSS tools are the right technology to define the post-PC software era, and not merely as a backend platform for someone else's proprietary SaaS suite. Today's typical Linux distribution, however, follows a design that resembles a legacy Unix system with a Windows-style front end bolted on. This is a competitor to products such as Vista, which may actually be the last of its kind, even for Microsoft. It would be unfortunate indeed to suddenly find ourselves stuck with yesterday's business model.
It is important to realize that the current approach to packaging FOSS is not the only possibility. The FOSS universe is by far the most diverse codebase in the world, and a capacity for diversity is the key to resilience as well as innovation. This is true in ecosystems, economies, investment portfolios, and even the skills of an individual. So, let us see how we can rearrange the traditional Linux distribution, to meet the challenge of the emerging new era.
The central purpose of this article is to try and start a genuinely fruitful discussion on the future of the Linux (or *BSD) distribution, a discussion which may help inform the direction of existing projects, and perhaps spawn new ones. Please continue this discussion by contributing ideas, references to other articles, and especially links to anything that includes actual code (even tiny examples).
Above all, I hope we treat this as an opportunity for shared exploration, not a contest about who is the "smartest guy in the room." This is a matter of proper governance (see the "A Note on Governance" section below), which is perhaps of equal importance to the future of FOSS as actual running code.
Here is a list of topics covered in the article.
- Reconsidering system administration and related tools, the "glue" that holds a distribution together.
- Combining local and remote applications under a single UI.
- The emergence of a free Web OS.
- A model for governance of FOSS projects, and online collaboration generally.
In addition, two appendices cover important supplementary topics for Linux distribution developers and SaaS providers. The first concerns packaging, hosting and delivery of Linux distributions. The second proposes a strategy to deal with the costly (and environmentally damaging) waste resulting from large server farms.
Now, let us begin our (re)exploration of the Linux distribution.
Before you can build for the future, you must take care of the past. Years ago, a wonderful toolset evolved around Unix. awk, sed, find and others, used together in shell scripts, could accomplish a great deal of work very quickly. Compare this approach to writing everything in C!
Today, however, the state of the art has advanced considerably. High-level languages (of which Perl, Python, and Ruby are the most popular) are far more capable than the old patchwork of small tools bound together with shell scripts. They incorporate elegant programming constructs, supply comprehensive standard libraries, and support rapidly growing communities of extension projects.
Yet, system administration is still tied to the shell and the old toolset, despite the astounding advantages of moving to a modern high-level language. Programming with such languages is faster and less error prone. The code is much more reusable, portable and upgradeable. Developers can write readable programs, which makes collaboration easier. Likewise, finding and training developers is a simpler task. Customizing server farms, clusters, corporate desktop rollouts and even novice tinkering all benefit.
Any one of Perl, Python, or Ruby (as well as a few others) could become the primary system administration tool, displacing the shell. In my experience, however, Python's highly readable, compact and consistent syntax makes it the ideal choice for this sort of work.
Many well-known Linux distributions already use Python in their key tools. Red Hat's Anaconda installer, and Gentoo's Portage package manager are two examples. Ubuntu (the top distribution for the last 12 months, according to DistroWatch) "... prefers the community to contribute work in Python."
The next logical step is to create a complete system administration environment using a high-level language. In large measure, this project is already underway. A relatively recent, highly rated (see review links on DistroWatch) distribution, Pardus, uses Python across many of its core tools. The Pardus team, recognizing that "High Thoughts Must Have High Language", has even written a new init framework in Python. Here is how they explain their choice.
Among the high level languages, Python seemed to be the best choice, since we already use it in many places like package build scripts, package manager, control panel modules, and installer program YALI. Python has small and has clean source codes. Standard library is full of useful modules. Learning curve is easy, most of the developers in our team picked up the language in a few days without prior experience.
Certainly, there exists a requirement for backwards compatibility with the old shell-based code. It is not difficult, however, to make these legacy tools available as optional packages, to ease the transition to the new system. Even here, augmentation with high-level languages provides significant benefits. The IPython project is an interactive Python shell that also retains much traditional shell functionality. In addition, IPython supports the Matplotlib graphing package.
Figure 1 shows an example of what IPython and Matplotlib can do. in about nine lines of code, excluding comments. Each circular patch on the graph represents a single process, larger patches indicate greater virtual memory size (
vsz format specifier to the
ps command). Percent CPU (
%cpu) use is on the vertical axis; you can see that most processes are using little CPU. More red color in a patch indicates a process whose resident size uses a greater percentage of the machine's memory (
Figure 1: Graph of process CPU and memory use, generated with IPython and Matplotlib