If there weren't enough reasons to use Linux in the enterprise these days -- besides the cost savings, speed, performance, and the growing list of open source applications that range from embedded applications like the TiVO personal video server though enterprise RDBMSs like DB2 and Oracle -- Linux has now firmly established itself in the rarefied air at the top of the computing world in bleeding-edge supercomputers.
Pioneered by Thomas Sterling and Donald Becker while both were at NASA's Center of Excellence in Space Data and Information Sciences (CESDIS), several of the fastest computers in the world are "parallel clusters" composed of large arrays of off-the-shelf Linux systems.
Parallel Linux clusters, better known by the name "Beowulf clusters" (after a middle-English mythic hero), are being used for weather prediction, high-energy physics, code breaking, as well as more down-to-earth applications such as data-mining, computer-generated animation, and massive multiuser games.
Linux clusters are, in their simplest form, a pile of machines running Linux with some specialized software that allows a program to be spread out over all of the computers in the cluster simultaneously and allows all of the computers in the cluster to be kept as close to 100 percent busy as possible all of the time.
A simple cluster can be constructed out of three Intel-based PCs connected with a 10BaseT hub and clustering software. One machine acts as the "master server" where data is stored, and the other computers act as "slave" or "compute" nodes that work on some portion of the problem being solved and return their results to the master server, where they are integrated into the final result.
More complex clusters can range from dozens of nodes up to thousands of nodes; be connected together with dozens of sophisticated network switches; and sport specialized, dedicated networks just for moving the data that keeps the compute nodes 100 percent busy. Several of these monster machines have shown up on the Top500 Supercomputer list, which represents a new achievement for Linux-based systems.
Linux clusters are a novel application of standard programming techniques writ very large. All modern software is written in a "divide and conquer" fashion. That is, a problem is analyzed and broken down into easily solvable parts and then program elements (functions, subroutines, or methods, depending upon your terminology or style of programming) are built to solve a particular piece of the problem. Usually this process is executed on a single CPU or sometimes on a symmetric multiprocessor (SMP) system using a technique called "multithreading" to allow a single program to spin off subtasks that can execute some of the subroutines.
Multithreading is a good way to squeeze performance and efficiency out of a computer, but performance gains are limited by the absolute amount of computer power you have in a given machine, and there are a limited number of processors that can be put into a single system, usually fewer than 64.
This works very well for most problems that are fairly linear in their use of processor time, but large-scale computation problems need a more powerful solution. Some classes of problems, including weather forecasting, the design and simulation of semiconductors or aviations systems, and data mining (such as looking for patterns in customer buying habits) require immense amounts of computing time and often both consume and generate many gigabytes of data. Problems such as these cannot be done with any accuracy in reasonable amounts of time (i.e., before the results would be rendered worthless) on regular off-the-shelf computer systems. These kinds of problems also require specialized programming techniques that can be used to break down the problem-solving program into bite-sized chunks, but the data too must be structured in such a way that the program can easily access it.
In order to make a pile of PCs into a supercomputer, software is needed to help manage the information flow between nodes in the cluster and to help distribute the work. The most common packages used to do this are the Parallel Virtual Machine (PVM) and the Local Area Multicomputer (LAM). These packages, the first developed at Oak Ridge National Labs and the second by Notre Dame University, are programming libraries that implement a message passing system that allows nodes on a network to cooperatively work on a problem.
As mentioned, Beowulf clusters are being used for a large variety of applications, including weather forecasting and high-energy physics problems such as the modeling of black holes. On a more down-to-earth level Beowulf clusters can be used to create lifelike animations or other computer-generated graphics -- films such as The Matrix, Titanic, and Toy Story all made extensive use of clustered computers to generate the huge amount of imagery that was required to make these films. Clusters are also being used more and more for applications such as data mining, simulation of semiconductors, CAD systems for developing everything from packaging to sneakers, and even the sequencing of the human genome.
As the cost of hardware and storage continues to decline, the use of clusters will increase dramatically. Already most database vendors support SMP machines and several, including Oracle, are starting to release versions of their software for Linux clusters. As these mainstay packages become available, they will drive the availability of a whole new class of applications that run on these machines, ranging from serious business applications to entertainment applications such as online gaming systems.
Another effect of the ever-increasing capability of computer hardware is its ever-decreasing size. Traditional supercomputers are very large beasts, and, until recently, clusters were even larger since they are, by definition, collections of rack-mounted servers. As system sizes decrease, the physical size of a cluster will decrease as well, while the overall computational capabilities will increase. Right now a 48-node cluster made of 2U (3 ½" high) rack-mount servers would take one and a half 70" rack enclosures and weigh over 1000 pounds -- and this wouldn't even include the networking gear! In the near future, with the availability of 1U (1 ¾" high) systems, a 48-node cluster will soon fit in a closet, and with upcoming and single-board computers, the same cluster will fit under your desk. All of these developments will bring more and more compute-power to bear on your problems.
One of the nice things about having a regular column is that you get a soapbox for your own projects! As the author of the upcoming O'Reilly book Building Linux Clusters, I recommend that if you're interested in getting your feet wet in the world of clusters and parallel computing, this is a good place to start. Building Linux Clusters is a hand-on introduction to cluster building that will help you decide what hardware to use, whether it's all new systems or older systems you may have "lying around," and help you get your cluster up and running in a matter of hours with a customized version of Red Hat Linux geared for cluster-building.
David HM Spector is President & CEO of Really Fast Systems, LLC, an infrastructure consulting and product development company based in New York
Read more Linux in the Enterprise columns.
Discuss this article in the O'Reilly Network Linux Forum.
Return to the Linux DevCenter.
Copyright © 2009 O'Reilly Media, Inc.