ONLamp.com
oreilly.comSafari Books Online.Conferences.

advertisement


Distributed Computing: Distributed Communities

by Howard Feldman
05/22/2003

In the last few years distributed computing over the Internet has rapidly gained in popularity, largely due to media coverage and the word-of-mouth advertising of projects such as SETI@Home and distributed.net. This article examines what motivates people to join distributed computing (DC) projects--donating their spare CPU cycles for the benefit of others--and the social groups which have sprung up among users of such projects.

What is Distributed Computing?

The idea of performing complicated computational tasks on multiple CPUs is not new. Cluster computing through the use of MPI (Message Passing Interface) or PVM (Parallel Virtual Machine) has been possible for years. The first Beowulf cluster computer was built nearly 10 years ago at NASA, and this form of affordable cluster computing has spread to the academic research environment. Although Beowulf clusters are cheap enough that most research labs can afford them, they are limited in size by the space available, and still have ongoing costs of electricity and cooling.

This is where DC really differs from cluster computing. No longer do the researchers need to purchase the computers: the CPU time is donated by users across the world. This was simply not feasible ten years ago. With the recent rapid growth of the Internet worldwide and the abundance of high-speed connected users becoming available, there is truly an enormous untapped amount of computing potential out on the Internet, making it effectively the world's largest supercomputer. On the other hand, it seems absurd that people would let strangers use their CPUs. Or does it?

Growth of a Community

The first DC project to make it into the spotlight was probably distributed.net, back in 1997. This project attempted to crack a 56-bit RC5 encrypted message by brute force. They offered a cash prize to the person whose computer cracked the encryption key. By donating your computer time, you had a chance of winning the cash pot, if you were lucky enough to get the chunk of work containing the correct key from amongst the 2^56 possibilities. For many, this was incentive enough. It was somewhat like playing the lottery, only you did not have to spend money on tickets, but rather run some software on your computer in the background.

Distributed Computing

In fact, although the winner would only receive $1000, far less than a lottery jackpot, many participants tended to get carried away, upgrading equipment, buying new computers or "borrowing" ones at work, in the hopes that they would solve the problem. The more CPU power, the greater the chances of winning. People who worked together, or groups of friends, formed informal teams to win the contest. Over time this spread beyond immediate acquaintances. As discussion forums and newsgroups grew around the nascent DC project, small online communities began to grow where people could discuss the project and get help. Distributed.net has had well over 300,000 users since inception and presently has about 10,000 users actively crunching away on 72-bit encryption.

Where did the other 290,000 users go? Some simply got bored and left. Some became concerned with issues such as security or gameplay slowing down with a DC client running on their machine. Many went and joined other, newer DC projects. But when they left, they took something with them. They took the spirit of competitiveness and cameraderie and kept their communities. Many had grown into larger online groups, with their own sites, statistics tracking, and message forums.

These groups are often united by similar world views and principles, rather than geographical or cultural boundaries. In a sense, they are "distributed communities". The largest of these DC groups have 5000 or more members: the Dutch Power Cows, Ars Technica, The Knights who say Ni!, and FreeDC. Some of these groups, such as Ars Technica, an Internet news resource for PC enthusiasts, existed prior to becoming involved in DC, but many did not.

Choosing a Project

Eventually more groups began to recognize the potential of distributed computing, creating projects to take advantage of it. Any project that could be trivially broken up into smaller problems, which could be solved independently, and then the answer to the bigger problem assembled from the answers to the smaller ones, was suitable for DC. Such a task is sometimes called "trivially parallelizable". It includes tasks such as searching for prime numbers, analyzing large amounts of data looking for patterns, and more recently even folding proteins and designing drug molecules.

With an abundance of such projects now available, users who wanted to donate their computer time now had something they never did before, that is, a choice. Since any given project typically uses all of your spare CPU cycles, you can typically only run one project per CPU. In essence, each DC project competes for the same set of veteran DCers, while trying to bring in new users to join the DC community.

Each DC group tested out new projects as they became available. If the evaluators liked a project, they would tell their comrades. In some cases, there would be a massive team exodus from one project to another. So this brings up the question, what do evaluators look for in a new project? What does it take to persuade individuals and teams to leave one project for another?

Money is generally not an important incentive. None of the projects so far have offered a large enough prize to make it worthwhile, especially if you have to split the pot with your whole team. The factors that seem to be important were much simpler. Was the software stable? Did it run smoothly in the background, not interfering with other programs, not hogging the CPU or system resources? Although many people have high speed connections these days, many still dial up to the Internet over modems as well, so if a project is "modem-friendly" that is a bonus. However, any well-made piece of software should have those qualities, so there has to be something more. Another important characteristic is the responsiveness of the project leaders to questions, concerns and suggestions. What seems to be the single, most powerful driving force motivating DCers is the production of statistics.

Just the Stats Ma'am, Just the Stats

Although most contemporary DC projects no longer offer physical prizes of any sort, the competitive spirit still remains. The incentive now stems from ego, the goal now being to prove that "my computer is bigger than yours". By participating and overtaking other individuals or teams, members get a boost to their self-esteem like no other. They can be recognized and cheered on by their peers and chastised (but always in a good-natured way) by the "enemy" teams. In effect they are drag racing their computers.

Project statistics keep track of how much each user has contributed to the project as a whole, be it in work units, CPU time, or whatever. Though not directly useful or important for a project's goals, it is of the utmost importance to the users that accurate statistics be kept of who contributed how much work and when. Statistics must be fair, offering an equal amount of credit for an equal amount of work. As well, this encourages people to constantly upgrade slow computers or purchase additional ones, since more computing power equals higher stats.

Stats are taken very seriously in the DC subculture, and any project with unfair stats will likely be ignored. Most teams have gone so far as to write their own Perl scripts to parse official project statistic web pages and create their own local stats pages. These pages often contain more detailed and complex information than is available from the main project site, including projected dates one user will pass another, complete ranking from first all the way down to last place, and so on.

Larger teams frequently set up their own mini-competitions as well. For example, they might see who can produce the most work in a seven-day period, with each team pooling all their resources into one project for that period. Again, it is all for fun, with the winners getting bragging rights until the next match. By making little games out of the statistics, DC projects become more fun for the participants and more attractive to potential new users.

But what About the Science?

This does not mean that everybody running DC projects is doing so for the stats. Every individual has his or her own reasons. Most DC projects today are attempting to solve some fundamental mathematical or scientific problem like the first 2 billion digits of pi or a cure for anthrax or cancer. Many people show a genuine interest in the problem being worked on. They may not be able to contribute intellectually to the problem, but they are more than happy to do so by donating their spare computer time. This is especially true, for example, for those who have lost loved ones to disease, such as cancer. These people are happy to contribute in any way they can to help research and find a cure for these illnesses. For DC clients that come in the form of a screensaver, people may download them simply because they like the screensaver. Still others run a project on their machine(s) at home simply because they cannot stand the thought of their shiny new CPU sitting around wasting cycles while they check their e-mail and type letters.

It is clear that distributed computing is here to stay, and as it continues to mature and prosper, these people will make the biggest difference. While several thousand people may devote a good deal of time and effort to DC projects, checking their production daily and enjoying the camaraderie of those with similar interests in high-end computer hardware, there are millions connected to the Internet that still have never heard the term "distributed computing", let alone participated in such a project. It is these people that must be reached, as they are the future of distributed computing.

At present only a small fraction of the potential of DC has been tapped. As more and more projects begin to use DC, it will become important to make the general public aware of it and to encourage people to donate their spare CPU cycles to projects with worthy goals. The future of distributed computing is bright, and as more people get turned on to the idea, project managers must prove themselves, convincing people to join one project instead of another. We can hopefully reach the average computer user, training them to recognize which projects are worthy of their computer time.

Resources

  • Internet-based Distributed Computing Projects
    A near exhaustive list of available DC projects, with info on each.

  • SETI@home
    Still one of the most popular and well-known projects, SETI searches through radio telescope data from outer space for any signs of intelligent life in the universe — none found yet.

  • Folding@home
    One of the first life science DC projects to become well-known to the public, they attempt to fold small peptides using molecular dynamics; some results have already been published in high-caliber scientific journals.

  • Distributed Folding Project
    A relatively new project, it attempts to tackle the protein structure prediction problem, once of the most difficult biological problems of the century, and it shows some promise.

  • distributed.net
    Possibly the first and likely the most well known DC project, they have focused on brute force cracking of encryption keys of various lengths--unfortunately of little practical value.

  • Seventeen-or-bust
    A very new project, this one looks for Proth prime numbers (also Sierpinski numbers), falling into a class of mathematical-based DC projects, requiring enormous computational resources previously unavailable.

  • GIMPS
    Another mathematics DC project, this one looks for (and has found) new Mersenne Primes.

Howard Feldman is a research scientist at the Chemical Computing Group in Montreal, Quebec.


Return to ONLamp.com.



Sponsored by: