Distributed Computing

Sharing the Workload

Consider a typical anthill. Depending on its size, it likely consists of millions, if not billions, of tiny grains of dirt or clay, each one carefully excavated from underground and individually hauled up to the surface where it is deposited. How long would it take to build such a structure? A single ant moving one grain of dirt at a time would take ages. But many anthills, even very large ones, seem to pop up practically overnight. How does such a monumental job happen so quickly?

The answer, of course, is that no single ant builds an entire anthill by itself. Each hill is the product of an entire colony, with each ant moving only a single grain of dirt at a time. But collectively, the entire colony is capable of moving thousands of grains at a time, vastly reducing the time needed to accomplish the greater task.

And this same process of distributing the workload for a large or complex task among multiple workers who can each operate in parallel with one another can also be applied to large and complex computational problems as well.

Thanks to the development of the Internet and other large-scale networking environments, the processing power of multiple computers can be harnessed to work together in solving complex computations that would otherwise be impractical, if not impossible, to solve using only a single computer.

Many such problems involve the processing of large amounts of raw data or the simulation of millions, if not billions, of various scenarios. Rather than asking a single computer to process the entire data set, it is possible to connect to a large array of additional computers and systematically farm out smaller segments of the larger data set to each computer. Each computer then only processes a small fragment of the overall problem, but collectively, this “colony” of networked computers can crunch through the data and achieve a result in a fraction of the time.


In recent years, a number of projects have emerged that employ this process of distributed computing to tackle problems that would otherwise be too resource-intensive to solve in a reasonable amount of time. One of the first such projects to gain widespread popularity is SETI@Home, launched in May 1999 by the University of California at Berkeley.

SETI (Search for Extraterrestrial Intelligence) is a scientific area whose goal is to detect intelligent life outside Earth. One approach, known as radio SETI, uses radio telescopes to listen for narrow-bandwidth radio signals from space. Such signals are not known to occur naturally, so a detection would provide evidence of extraterrestrial technology.

The SETI@Home project searches through massive amounts of astronomical data collected by radio telescopes in an attempt to recognize unique, space-borne radio signals that do not occur naturally. Signs of such signals would provide intriguing evidence for the possibility of some form of intelligence beyond Earth.

First conceived in 1995 around the same time that the World Wide Web was emerging and growing in popularity, the project aimed to harness the processing power of all of these newly networked home computers around the world to assist in searching through and analyzing the large volumes of radio telescope data.

Individual users who wish to participate in SETI@Home and donate their idle computer time can download the project’s BOINC (Berkeley Open Infrastructure for Network Computing) software and run it on their own computer. This program connects remotely to the SETI@Home servers that then deliver an ongoing series of processing jobs that your computer can crunch on in the background (the program can even be set as your screensaver). As your computer completes each assigned task, it sends the results back to the home server and receives its next batch of data to process.

SETI@Home is just one of many distributed computing projects that users can participate in. The BOINC software alone allows users to join over three dozen scientific research projects that address topics across the full array of scientific disciplines, including physics, mathematics, chemistry, encryption, evolution, astronomy, medical, biological, computer science, etc.


Unfortunately, massively distributed computing can also be used for malicious purposes as well as the beneficial uses described above. Such is the case for botnets—large collections of networked computers that have been infected by a worm, virus, or other form of malicious software that enables the computer to be remotely controlled or utilized without the owner’s knowledge.

Much like the way that the SETI@Home servers deliver payloads of data to be processed on users’ computers, botnets also deliver commands to its network of “zombie” computers, instructing them to perform a variety of less altruistic actions like sending massive amounts of spam or launching coordinated DDoS (Distributed Denial of Service) attacks upon unsuspecting sites.

In fact, these botnets are one of the primary reasons that spam and DDoS attacks are as bad as they are. Firstly, the large number of “zombie” computers forming the botnet contributes to the massive scale of the spam and attacks. Secondly, the distributed nature of the botnet makes it difficult to track down the originating computers because they do not come from a single, centralized source.

Other botnets have been known to hijack the processing cycles of unsuspecting victims’ computers to engage in bitcoin mining.

"What would you do if you could harness the processing power of a very large array of computers?”