Brian Hayes. American Scientist. Volume 86, Issue 2. Mar/Apr 1998.
The most powerful computer in the world, according to a recent ranking, is a machine called Janus, which has 9,216 Pentium Pro processors. That’s a lot of Pentia, but it’s a pretty puny number in comparison with the 20 million or more processors attached to the global Internet. If you have a big problem to solve, recruiting a few percent of the CPUs on the Net would gain you more raw power than any supercomputer on earth.
Of course the trick is to get all those millions of scattered machines working on your problem. The 9,216 Pentiums are all conveniently housed in a single room at the Sandia National Laboratory in Albuquerque. Setting them to work on the task of your choice is a simple matter; all you need is an account on the machine, a password, an allocation of CPU time, possibly a security clearance, and a little knowledge of programming in a specialized dialect of FORTRAN or C. Persuading the Internet to do your bidding is not so easy.
And yet it can be done. Consider the hunt for trophy-quality prime numbers. For two decades, the weapon of choice in this elite sport was a supercomputer-preferably the latest model from Cray Research. Beginning in 1979, the prime-number pursuit was dominated by David Slowinski and his colleagues at Cray (which is now a division of Silicon Graphics). The Cray team had to keep topping their own records, because they had so little competition elsewhere. In 1996, however, a new largest prime was found with an unpretentious desktop PC. The discoverer was a member of an Internet consortium who attacked the problem collectively with a few thousand computers. In August of 1997 another member of the same group found a still larger prime, which stands as the current record. Slowinski, being a good sport, offered one of his supercomputers to verify the discoveries.
The rise of cooperative-computing projects on the Internet is both a technical and a social phenomenon. On the technical side, the key requirement is to slice a problem into thousands of tiny pieces that can be solved independently, and then to reassemble the answers. The social or logistical challenge is to find all those widely dispersed computers and persuade their owners to make them available.
What’s most charming about collective computing is that it relies entirely on resources that would otherwise go to waste. The computing is done with spare CPU cycles on idling machines.
It is one of the wonders of our age that we squander vast quantities of computational labor. Forty years ago, when electronic computers were rare and expensive, CPU time was scheduled and billed by the millisecond. Now, computers spend most of their time displaying zooming multicolored windowpanes or simulating an aquarium. These “screen saver” programs, which compute nothing, whose only purpose is to stir up pixels on the display screen, probably consume more of the world’s computational capacity than any other kind of software. Go into almost any office and you’ll find machines busily saving their screens all night and all weekend.
Even when a computer is ostensibly in use, it is mostly idle. Typing furiously, you might produce 10 keystrokes per second; that’s not much of a distraction for a processor that can execute 100 million instructions in a second. Under these conditions the processor spends most of its time going around in a tight little loop, asking over and over, like a fidgety toddler, “What can I do now?”
This waste of computational machinery is not something we need be ashamed of. The CPU cycles we fritter away today will not be deducted from the legacy bequeathed to our grandchildren. Still, every waste is also an opportunity, and the cycles you have no use for may prove valuable to someone else.
The idea of scavenging unused cycles arose almost as soon as computers were linked by networks. A few early experiments with distributed computing, including a pair of programs called Creeper and Reaper, ran on the ARPAnet, the 1970s predecessor of today’s Internet. Later, when the Xerox Palo Alto Research Center (PARC) installed the first Ethernet, a program cruised the network at night, commandeering idle computers for CPU-intensive tasks. This early cycle recycler was the creation of John E Shoch and Jon A. Hupp, who called it a “worm,” citing the inspiration of John Brunner’s novel Shockwave Rider. (A colleague, noting the program’s nocturnal habits, suggested the alternative name “vampire.”) A later scavenger system called Condor, developed by Miron Livny and his colleagues at the University of Wisconsin at Madison, is now running at more than a dozen universities and other sites. Condor roams within clusters of Unix workstations, usually confined to a single laboratory or department.
Going out over the public Internet to scrounge cycles is more difficult because the machines are more diverse and the network connecting them is more tenuous. Furthermore, communicating with the machines is only part of the problem; making connections with the machines’ owners is also harder. Nevertheless, Internet “metacomputing” has been going on for at least a decade. Here are some of the notable projects.
Factoring. Finding the prime factors of a composite integer is a classic among computationally hard problems. The inverse process-multiplication-has an efficient algorithm we expect children to master, but factoring seems to be intractable when numbers get big. Adding three decimal digits to the length of a number doubles the effort needed to factor it.
But if factoring is hard, it is also ideally suited to parallel computation. Splitting the work among k computers produces an answer very nearly k times faster.
The first Internet factoring project was organized in 1988 by Arjen K. Lenstra (now of Citibank) and Mark S. Manasse of the DEC System Research Center in Palo Alto. They and their colleagues had written software to distribute factoring tasks among workstations within the DEC laboratory, and they extended this system so that computers elsewhere could contribute to the effort. The infrastructure was simple: Factoring tasks were parceled out by electronic mail, and results came back the same way
As early as 1989 Lenstra and Manasse had already given an astute analysis of the economics of collective computing. They could get equivalent performance, they estimated, from 300 workstations or 1,200 PCs or a single high-speed machine designed especially for factoring. If they had to buy all the hardware, the last option was clearly the best choice. But if the owners of workstations or PCs could be induced to donate CPU cycles free of charge, that price would be hard to beat. By 1990 Lenstra and Manasse and about a hundred e-mail collaborators from around the world were routinely factoring numbers of 100 decimal digits. In 1993 a larger group was assembled to take on a number known as RSA129. Some 600 volunteers labored for eight months to factor this 129-digit number, winning a prize of $100 for their efforts. Two years later RSA-130 (a digit longer) succumbed to another army of factorers. This time some of the work was coordinated not by e-mail but through a World Wide Web interface designed by James Cowie of Cooperating Systems Corporation.
Primes. A counterpoint to factoring is the search for primes-numbers that cannot be factored. Primes of 100 or 200 decimal digits no longer attract much notice; with the right software any modern computer can find examples of such numbers in a few seconds. The frontier of knowledge for primes is now out near a million digits. Even the mathematically jaded may marvel at a prime of such magnitude: Think of all the smaller numbers that might divide it, but don’t.
Most of the largest known primes are Mersenne primes, named for the 17th-century friar Marin Mersenne. A Mersenne number has the form 2n-1, but not all such numbers are prime. In the first place, for 2n-1 to be prime, n itself must be prime, but even that is only a necessary condition, not a sufficient one. (Try n = 11.) To find Mersenne primes, you must first calculate 2n-1 for each prime value of n, then test the result for primality. Algorithms based on the work of Edouard Lucas and D. H. Lehmer greatly speed up the primality tests.
The two largest known primes were found by participants in the Great Internet Mersenne Prime Search, or GIMPS. The founder of this project is George Woltman, a computer scientist who wrote efficient software for Lucas-Lehmer primality tests and made it available on a Web site. (You need not understand the mathematics of the Lucas-Lehmer test to run the software.) Some 4,000 volunteers have contributed to the search so far. In November 1996 Joel Armengaud discovered that 21,398,269-1 is prime. Then in August 1997 Gordon Spence proved the primality of 22976,221-1. This number, which has 895,932 decimal digits, is the 36th Mersenne prime to be discovered. (It may not be 36th in the numerical sequence of Mersenne primes, however, because there are exponents less than 2,976,221 that have not yet been checked.)
Code-breaking. The collective-computing projects that have attracted the most participants have been attempts to decipher encrypted messagesbut the volunteers are not snooping into anyone’s confidential e-mail. RSA Data Security, a company in the secrecy business, has posted a number of cryptographic puzzles, with cash prizes for those who solve them. The company’s aim is to test the security of their own products and to demonstrate the vulnerability of encryption schemes they consider inadequate. The factoring of RSA-129 and RSA-130 was part of this program. Other RSA challenges don’t involve factoring but call for a direct attack on an encrypted text.
In one challenge the message was encoded with DES, the Data Encryption Standard, a cipher developed in the 1970s under U.S. government sponsorship. The key that unlocks a DES message is a binary number of 56 bits. In general the only way to crack the code is to try all possible keys, of which there are 256, or about 7 x 1016. This task was undertaken by an Internet collaboration called DESCHALL, organized by Rocke Verser, Matt Curtin and Justin Dolske. In June of 1997 they got lucky, discovering the correct key after checking just 18 percent of the possibilities, and won the $10,000 prize.
Another RSA challenge also employed a 56-bit key, but with an encryption algorithm called RC5. Three groups, known as Bovine, Infinite Monkeys and Cyberian, all began recruiting volunteers for the RC5 attack. Bovine eventually attracted the most helpers, and it was they who found the key and deciphered the message in October 1997, after exhausting 47 percent of the key space, or 34 quadrillion keys. Bovine was organized by Adam L. Beberg, Jeff Lawson and David McNett.
Compared with earlier distributed-computing projects, the RC5 efforts were not only technically sophisticated but also reached a new level of promotional and motivational slickness. For example, the Bovine software kept statistics on the contributions of individuals and teams, adding an element of competition within the Bovine ranks as well as between Bovine and the other groups. In the team standings, Macintosh militants finally prevailed over partisans of the Linux operating system. By the end of the contest some 4,000 active teams were processing 7 billion keys per second, a rate equivalent to the
work of 26,000 Pentium computers. On completing RC5-56, the Bovine collaboration turned to RC5-64, a cipher with a 64-bit key. The effort needed to break this code will be 256 times greater, which suggests it could be a labor of decades. It’s worth pausing to ask whether the brute-force testing of 18,446,744,073,709,551,616 cryptographic keys is really a better use of resources than displaying pretty fish in an aquarium. Beberg and his colleagues are considering other possible projects. Meanwhile, RSA has announced a new challenge. This time the message is encoded with the same DES algorithm broken last spring, but the contest rules are altered to reward speed of decryption. The initial prize of $10,000 drops to zero after 67.5 days. Bovine has taken up the challenge.
Golomb rulers. Imagine a six-inch ruler with marks inscribed not at the usual equal intervals but at 0, 1, 4 and 6 inches. Taking all possible pairs of marks, you can measure six distinct distances: 1, 2, 3, 4, 5 and 6 inches. A ruler on which no two pairs of marks measure the same distance is called a Golomb ruler, after Solomon W. Golomb of the University of Southern California, who described the concept 25 years ago. The 0-14-6 example is a perfect Golomb ruler, in that all integer intervals from 1 to the length of the ruler are represented. On rulers with more than four marks, perfection is not possible; the best you can do is an optimal Golomb ruler, which for a given number of marks is the shortest ruler on which no intervals are duplicated.
The world is not panting in desperate need of better Golomb rulers, and yet these combinatorial curiosities do have practical uses. For example, in setting up an interferometer for radio astronomy, placing the antennas on the marks of a Golomb ruler maximizes the recovery of information about the phases of the signals received.
Many good Golomb rulers have been found by trial-and-error, but proving them optimal (or finding a better ruler if one exists) is computationally expensive. In 1995 a dozen workstations took four CPU-years to prove there is no 19-mark ruler shorter than 246 units. The computation was done by Apostolos Dollas, William T. Rankin and David McCracken of Duke University.
In 1996 David Vanderschel and Mark Garry, who had both worked on Golomb rulers independently, merged their ideas in a program called GVANT, which turned out to be significantly more efficient than earlier programs. They quickly confirmed the known results for rulers of up to 19 marks, but even with their highly optimized algorithm a search for the best 20-mark ruler was a formidable undertaking. They therefore sought Internet collaborators. With seven helpers it took about half a year to prove that a 20-mark ruler of length 283 is optimal.
In the spring of 1997 Vanderschel and Garry turned to the 21-mark ruler, for which the shortest known arrangement is 333 units long. For this ruler a naive algorithm would have to check more than 1030 arrangements of the marks; GVANT prunes the number of cases to about 1015. Roughly 100 volunteers have pitched in to help, but after a year’s searching there is still plenty of work left for latecomers. Out of 1,200 trillion arrangements to be checked, fewer than 200 trillion have been examined so far.
Aliens. If the prospect of finding a bigger Mersenne prime or a smaller Golomb ruler won’t induce you to pledge your CPU, perhaps you would like to discover an extragalactic civilization. A proposal called [email protected] would put thousands of computers to work sifting for signs of life in signals recorded with the radio telescope of the Arecibo Observatory in Puerto Rico.
A search for extraterrestrial intelligence has been going on at Arecibo for almost two decades. The telescope slowly scans the sky, detecting emissions over a broad range of radio wavelengths. A special-purpose signal-processing computer applies a Fourier transform to the data to pick out narrow-bandwidth signals, which could be the alien equivalent of “I Love Lucy.” The astronomers would like to put the data through a more thorough analysis, but computing capacity is a bottle neck. That’s where [email protected] comes in.
With enough computers on the job, the Arecibo data could be sliced into finer divisions of bandwidth and frequency. Moreover, the analysis software could check for other kinds of regularities, such as signals that pulsate or that “chirp” through a sequence of frequencies. The task is well suited to Internet computing in that only small blocks of data need to be passed back and forth over the network, but a great deal of computing is needed on each block.
[email protected] is the project of Dan Werthimer of the University of California at Berkeley and several colleagues. Their plans are ambitious; they seek a mass audience. The client software they envision would run as a screen saver, starting up automatically whenever a machine is left idle. As the program churned away on the data analysis, it would also display a series of images related to the project, such as a representation of the data currently being examined or a map showing the progress of the sky survey. Some 70,000 people have signed up for [email protected] Unfortunately, the project is on hold for lack of funding.
Factors, primes, codes, rulers-some of these projects sound like they might belong in the Guinness Book of World Records. They’re not frivolous, but they’re not quite in the mainstream either. There are plenty of other areas of science and engineering that could benefit from cheap and abundant computing. The traditional big consumers of CPU cycles include the analysis of seismic data, simulations of many-body systems, studies of protein folding and other kinds of computational chemistry, studies of turbulent fluid flow, and lattice models of quantum field theories. Could such tasks be shared over the Net? When viewed as a massively parallel computer, the Internet has a peculiar architecture. It is extraordinarily rich in raw computing capacity, with tens of millions of processors. But the bandwidth for communication between the processors is severely constrained. The 9,216 Pentiums of the Janus computer can talk to one another at a rate of 6.4 billion bits per second; for a node connected to the Internet by modem, the channel is slower by a factor of 100,000.
The limits on bandwidth determine what kinds of algorithms run smoothly when spread out over the Net. Consider the case of an n-body simulation, which describes the motion of particles in a force field, such as stars in a galaxy or atoms in a fluid. One parallel n-body algorithm assigns each particle to its own processor, which then tracks the particle’s path in space. The trouble is, each processor needs to consult all the other processors to calculate the forces acting on the particle, and so the volume of communication goes up as n2. That won’t fly on the Net.
Yet n-body problems are not necessarily unsuited to network computing. There are other n-body algorithms, and other ways of partitioning the problem. In particular, “tree codes” organize the computation hierarchically. At the bottom of the hierarchy a processor calculates motions inside a small cluster of particles, without reference to the rest of the universe. At the next level several clusters are combined, ignoring their internal dynamics and looking only at the motions of their centers of mass. Then clusters of clusters are formed, and so on. Tree codes are popular for n-body computations, but whether they can be adapted to Internet computing remains to be seen.
Memory capacity is another serious constraint. Computer owners who are willing to give away CPU cycles may be less eager to let someone fill up their machine’s memory or disk drive. Both the bandwidth and the memory limits will be difficult hurdles for programs that operate on large volumes of data, as in seismic analysis or weather prediction.
And yet there are powerful incentives for clearing those hurdles. In round orders of magnitude, a typical personal computer will soon execute 100 million instructions per second; it will have 100 megabytes of memory and a gigabyte of disk storage; it will consume 100 watts of electricity and cost $1,000;100 million of these machines will be attached to the Internet. Multiply it out: 10 quadrillion instructions per second, 10 billion megabytes of memory, 100 million gigabytes of disk storage, 10 gigawatts of electric-power demand, a price tag of $100 billion. It’s probably worth rewriting your software to gain access to such a machine.
Collectivists and Capitalists
And what about incentives for the owners of that $100 billion distributed computer? The spirit of volunteerism is a wonderful thing, but it doesn’t always scale well. If Internet computing ever catches on in a big way, we’ll probably hear less about cooperatives and collectives, and more about return on investment.
One way of paying for Internet computing is through a commodity market in CPU cycles. If you have 100 computers with nothing to do nights and weekends, you offer the spare capacity at an asking price expressed in millicents per trillion instructions or dollars per Pentium-year, or some such fabricated unit. Meanwhile someone with a big batch of numbers to crunch enters a bid for a stated quantity of computation, measured in the same units. An automated clearinghouse matches up buyers and sellers.
It could be fun to watch the fluctuations of such a market. When Lucasfilms needs half the processors in the galaxy to render scenes for the next Star Wars saga, prices shoot up. Over academic holidays, excess capacity brings on a seasonal slump. The market is likely to be volatile, because CPU cycles are like electricity or fresh asparagus: You can’t stockpile them for later use.
Trading in CPU cycles is not a new idea. As early as 1968 Ivan Sutherland wrote of a “futures market in computer time”-although his market consisted of only a whiteboard and colored pens. The market mechanism was explored in greater depth at Xerox PARC, in an experiment described by Carl A. Waldspurger, Tad Hogg, Bernardo A. Huberman, Jeffrey 0. Kephart and W. Scott Stornetta. For the most part their market worked as economists would predict-each job’s share of the total machine time was proportional to its funding-but they did observe some interesting undamped oscillations in prices.
Could a real market, backed by real money, evolve on the Internet? Questions of security and confidentiality would need to be addressed. When I send you my program to run, how do I know you won’t pry it open and steal my secrets? How do you know my program won’t steal your secrets? (The answer to both questions is probably Java.) Another essential precondition is that CPU cycles have to become what economists call a fungible asset, meaning that cycles on one computer are readily converted into those on any other. This is a hard problem but not insurmountable.
In the end, though, the economics of the global metacomputer are probably less interesting than the operation of the machine itself. Given a free flow of computations through all those 100 million nodes, what would such a device wind up computing? I hope that we as a civilization can find a better use for this machine than we have for Janus, whose primary function is to simulate the explosion of nuclear weapons.