Darwin’s Dangerous Algorithm

In Daniel Dennett’s Darwin’s Dangerous Idea: Evolution and the Meanings of Life, the dangerous idea is natural selection, Charles Darwin’s namefor the mechanism governing the evolution of species that he codiscovered with Alfred Russel Wallace. Natural selection is the “process by which individual characteristics that are more favorable to reproductive success are ‘chosen,’ because they are passed on from one generation to the next, over characteristics that are less favorable”.1

Natural selection explains how characteristics that promote reproductive success persist and under certain circumstances can come to dominate less favorable characteristics, but it doesn’t explain how those characteristics are passed on or how new characteristics come into being. It was Gregor Mendel, living around Darwin’s time, who suggested that heritable characteristics are packaged in the discrete units we now call genesand that offspring inherit a combination of their parents’ genes.

When you combine Darwin’s and Mendel’s ideas, you have the basis for genetic algorithms, an interesting class of algorithms generally attributed to John Holland that take their inspiration from evolutionary biology. These algorithms simulate some aspects of biological evolution — while ignoring others — to tackle a wide range of problems from designing circuits to scheduling trains. Genetic algorithms allow the algorithm designer to specify the criterion for reproductive success by supplying a fitness function, alluding to the idea that in a competitive environment only the fittest survive to reproduce.

In biological evolution, the selection of favorable characteristics is determined by the environment, including the individuals living in that environment. Evolution does not move toward any grand design, nor is it likely to agree with our notions of progress except insofar as we can engineer our environment to select for those characteristics we deem valuable. In a genetic algorithm, however, the algorithm designer has complete control over reproduction. Genetic algorithms illustrate how computer programmers can simulate any natural process that they can articulate precisely and even modify their simulations to create processes not found in nature. Sound dangerous to you?

15.1  Competing hypotheses

A few years back, six students asked me to be the faculty sponsor for a group independent study project (GISP) in the following semester. The proposed GISP’s subject matter was all over the map: connections among such topics as evolution, natural selection, computational complexity, and combinatorial optimization. In particular, they wanted to explore genetic algorithms and genetic programming (basically, a methodology involving the use of genetic algorithms to automatically produce programs for particular purposes).

I was skeptical of both the material and the value of spending an entire semester studying it. From my perspective, genetic algorithms and the related genetic programming are simply methods for searching in a very large set of hypotheses, and not necessarily the best methods at that. This idea of searching among hypotheses is important in computer science and artificial intelligence in particular, and while the words “searching” and “hypotheses” are probably familiar to you, I wouldn’t be surprised if you find their combination a bit puzzling.

Depending on the context, the term hypothesismight be to a possible answer to a question, a possible explanation for an observed event, or a possible solution to a problem. The set of all such possible (or competing) answers, explanations or solutions for a given question, event or problem is called a hypothesis space. In many computational problems, you proceed by looking at or searching such a space of hypotheses. If you’re lucky, the hypothesis space has some inherent structure or regularity, so that, having examined one hypothesis and found it wanting, you can extract clues about how to transform it into one or more alternative hypotheses that might, just might, correspond to better answers.

This approach of searching in a large hypothesis space is often applied to computationally hard problems like the traveling salesman problem(TSP) in Chapter 12 in which we’re looking for a minimal-length tour— a path in a graph that begins and ends with the same node and visits each node exactly once. In terms of searching among hypotheses, the hypothesis space for solving an instance of TSP is the set of all tours. There is an exponential number of such tours, and so for any reasonably large n the hypothesis space is very large indeed.2

In searching in the space of possible tours for a minimal (or at least relatively short) tour, imagine an algorithm examining its current hypothesis, the shortest tour it’s come up with so far. Some portions of this tour may have the salesman ineffectively bouncing back and forth across the country. A clever algorithm might modify the current tour to have the salesman visit all the locations on the east coast before those on the west coast. Alternatively, it might take several of the tours it’s encountered and combine the best parts of each to create a new tour that’s better than any seen so far: for example, one tour might visit the west-coast locations efficiently and a second tour do a better job on the east-coast locations.

Search algorithms implement various strategies for systematically searching large hypothesis spaces by combining, transforming, and otherwise improving on the hypotheses they’ve seen so far. Genetic algorithms are search algorithms that exploit ideas from evolutionary biology to orchestrate their search. My skepticism about the subject matter of the GISP was simply that, however intriguing the mechanism of natural selection, I was reasonably confident that other search methods would be more effective for any given problem we might wish to solve. Nevertheless, I was interested to see if nature could teach us new computational tricks, and the very idea of simulated evolution seemed fascinating enough to occupy us for a semester, if not longer. I agreed to sponsor the GISP.

15.2  Genetic algorithms

Genetic algorithms (GAs) are a general method (framework, really, as I’ll explain shortly) for searching in very large hypothesis spaces. The “very large” qualifier is added because if the space is small enough, then we can search it exhaustively by simply enumerating every hypothesis and picking out the best one. Such general search methods can be used to solve so-called combinatorial optimizationproblems in which the component pieces of solutions can be combined in a large number of ways, or for learning problems like the spam-classification problem in Chapter 8 and automatic programming problems like synthesizing a robot program to guide a robot through a maze. It certainly would be nice to get a computer to write all your programs for you. Indeed, genetic algorithms are sometimes scathingly called the second-best method for solving just about any problem.

So I was skeptical, but I got caught up in my students’ enthusiasm. GAs operate on whole collections or populations of hypotheses encoded as strings of digital DNA, that is, as data structures that programmers often think of in terms of genes, codons, and other terms familiar to molecular biologists. Hypotheses are subjected to simulated mutation and the “fittest” hypotheses are allowed to “reproduce”, yielding offspring that combine features of the best hypotheses.

Once you get into the spirit, you can borrow just about anything you like from evolutionary biology. You could even incorporate ideas from currently unpopular theories such as that of Jean-Baptiste Lamarck(1744–1829), who proposed that evolution is directly influenced by the experiences of individual organisms over their lifetimes. Since you are writing the program that implements the simulation, you have a great deal of freedom. (It’s a bit like playing God, which may explain some of its attraction.) More often than not, however, programmers tend to follow nature’s suggestions, hoping, I suppose, that nature has stumbled upon some effective ways of searching in very large hypothesis spaces. Keep in mind, however, that nature isn’t in any particular hurry, and also that natural selection isn’t out to optimize anything.

They had me reading everything: Richard Dawkins’s The Selfish Gene, James Watson’s The Double Helix: A Personal Account of the Discovery of the Structure of DNA, Robert Axelrod’s The Evolution of Cooperation, John Maynard Smith’s Evolution and the Theory of Games and Evolutionary Genetics, to name just a few of the books on my night-time reading table. Genetic, molecular and evolutionary biology provide a wealth of ideas on which to base search heuristics. We were looking for anything to make our algorithms better or perhaps supply some missing piece, thereby letting us tap the power and mystery of natural selection. Why settle for some ho-hum low-fidelity version of simulated evolution when you can have a supercharged high-fidelity version with all the bells and whistles?

In addition to all the readings in biology and related fields, we also read Melanie Mitchell96’s An Introduction to Genetic Algorithms, a very readable introduction to the field. I had met Melanie two years earlier during a sabbatical stay at the Santa Fe Institute and she had managed to dispel some of my skepticism about the practical value of genetic algorithms. During my stay in Santa Fe, I had gained grudging respect for what a good programmer could do with GAs. It’s important to point out, however, that they’re not a ready-made solution to any particular problem; they offer a general framework for solving problems, but the programmer has to supply the most important pieces.

If we want to try to solve an instance of the traveling salesman problem using GAs, the first thing we need to think about is how to represent a hypothesis, that is, a tour. Once you have the basic ideas from biology, it’s relatively easy to translate them into algorithms and data structures. For our purposes, a string of digital DNA corresponds to a list of simple tokens.3 For the tokens in our TSP instance, we’ll use the standard three-letter codes for the airports in six cities: BOS for Boston, BWI for Baltimore, DFW for Dallas–Fort Worth, ORD for Chicago, PVD for Providence and SFO for San Francisco.

A tour, then, is a list of six tokens. The final leg of a tour is a flight from the city corresponding to the last token in the list back to the city corresponding to the first. An example tour starting and ending in Boston would be (BOS PVD BWI SFO ORD DFW). We also need a matrix representing the distance between each pair of cities:

BOS BWI DFW ORD PVD SFO
BOS 0000 0371 1550 0867 0043 2704
BWI 0228 0000 1100 0621 0328 2400
DFW 1550 1100 0000 0802 1503 1464
ORD 0867 0621 0802 0000 0849 1846
PVD 0043 0328 1503 0848 0000 2654
SFO 2704 2400 1464 1846 2654 0000

Using this distance matrix, we can calculate that our example tour has length 6969. In terms of minimizing the distance traveled by our traveling salesman, the first part of the tour makes a certain amount of sense: we travel down the eastern seaboard from Boston (BOS) to Providence (PVD) to Baltimore (BWI). But then we shoot over to San Francisco (SFO), back across the country to Chicago (ORD), and down to Dallas–Fort Worth (DFW) before flying back to Boston. If we reverse the order of tokens corresponding to Chicago and San Francisco in the original tour, we get the tour (BOS PVD BWI ORD SFO DFW) of length 5852, a significant improvement.

You might imagine two tokens swapping positions as being a mutationof sorts. And you might implement this form of mutation by using a subroutine for generating random numbers. With such a subroutine, it’s very easy to write a program that examines a list representing a tour and, with a given fixed probability, either reverses the order of two consecutive tokens or leaves them as they were. Such a program is called a genetic operatorin the parlance of genetic algorithms. Is this form of mutation biologically plausible? Perhaps, but it doesn’t really matter; we’re calling the shots and we’re free either to slavishly emulate natural selection as far as we understand it or to depart from reality as we see fit.

Other forms of mutation might not work out so well for our purposes. If our mutation routines were free to substitute any one token for any other token with some probability, a more plausible biological model, a tour might end up visiting the same city twice or, worse, inserting a city we didn’t have to visit at all. Luckily we can tame natural selection to avoid genetic misfits, or, in our case, tours that lead us in circles or to destinations we’d rather not visit.

We can also arrange for programs to simulate sex and thereby enable hypotheses represented as strings of digital DNA to share their genetic heritage. Our current winner, (BOS PVD BWI ORD SFO DFW) (length 5852), has a good east-coast solution and a so-so finish. On the other hand, (BOS BWI PVD DFW SFO ORD) (length 6379) wanders up and down the east coast but takes a nice swing down through the south (DFW), over to the west coast (SFO) and then back across the country, stopping in Chicago (ORD) before returning to Boston (BOS). Suppose these two were to get together and swap their best characteristics.

We break (BOS PVD BWI ORD SFO DFW) into two equal parts, (BOS PVD BWI) and (ORD SFO DFW), and do the same for (BOS BWI PVD DFW SFO ORD), resulting in (BOS BWI PVD) and (DFW SFO ORD). Now we swap parts to form two new sequences: (BOS BWI PVD ORD SFO DFW) of length 6408, which ain’t so great, and a second, golden child, (BOS PVD BWI DFW SFO ORD) of length 5648, the best we’ve seen so far. Does this have any analog in natural selection? Yes, during reproduction the strands of DNA from two parents are lined up and segments called chromosomes are swapped in a process known as crossing over. In the terminology of genetic algorithms, this exchange of genetic information is accomplished by a crossovergenetic operator.4 Unlike the simple mutation operator we discussed earlier, crossover operators require two strings of digital DNA to tango.

So, how do we simulate sex and reproduction in a genetic algorithm? First of all, we’re quite clear about what it means for tours, corresponding to strings of digital DNA, to improve: shorter tours are more desirable and we’re searching for the shortest possible tour. In natural selection, individuals are not necessarily getting “better”, even on average; natural selection simply selects for genes (which are basically programs for building individuals or pieces of individuals) that manage to do a good job at reproducing themselves. If such genes give rise to individuals that are stronger or smarter, that’s interesting but it needn’t happen. It’s certainly possible for natural selection to stumble on a program for building small, fast, mindless individuals that are superior to us in reproducing so effectively that they wipe us off the planet. But, as designers of genetic algorithms, we are free to play fast and loose with what we know about natural selection. In particular, we get to define the criterion that governs who gets to reproduce and how much.

At any point in its operation, a GA has an existing population corresponding to the current generation, a set of tours in the case of TSP, that it analyzes to determine which entities will get an opportunity to reproduce. Using the results of its analysis, the GA then constructs the next generation by mutating and pairing selected entities from the current generation. The generation so produced becomes the current generation and the GA repeats this process. If an entity is found that meets some specified termination criterion, the GA halts; otherwise it continues until it has examined a specified number of generations. Exactly how entities are selected for reproduction is at the heart of designing a GA.

It shouldn’t be hard to convince yourself that not all mutations and crossovers produce improvements; indeed, in our crossover example, only one of the two new tours was an improvement on its parents. We probably don’t want to give all entities an equal opportunity to reproduce, nor can we expect that all entities resulting from reproduction will be equally desirable for future reproduction. We use the term fitness as our measure of how desirable or fit a string of digital DNA is for reproduction. The higher the fitness of a string, the more likely it will be selected for reproduction.

Since shorter tours are better than longer ones, we want a fitness functionthat’s inversely related to tour length. Since we’re interested in conferring a reproductive advantage within a given population, we focus on the entities within that population. Let’s define the fitness function for a tour as the difference between that tour and the longest (worst) tour in the population. With this definition the longest tour is the least fit (it has a fitness of zero) and the shortest tour is the most fit. How can we use this measure of fitness to select tours for reproduction?

15.3  Survival of the fittest

Randomness seems to play a role in natural selection. Occasionally fit entities fail to reproduce and not-so-fit entities reproduce more than one would expect. Randomness helps nature avoid heading down a reproductive dead end. In GAs, the process of selection is carried out like a lottery in which fit entities get more lottery tickets than less fit entities, but even a not-so-fit entity can hit the jackpot and reproduce a whole gaggle of offspring. Typically the probability that an entity will be allowed to reproduce is proportional to its fitness. A probability distribution assigns such a probability to each individual in a population. In Scheme, we might calculate the fitness and reproductive probability of the entities in a population using

(define (distribution population)
  (let* ((worst (apply max (map tour-length population)))
         (fitness (lambda (tour) (- worst (tour-length tour))))
         (total (apply + (map fitness population)))
         (probability (lambda (tour) (/ (fitness tour) total))))
    (map probability population)))

where (let* (variable-value-pairs) body) is like (let (variable-value-pairs) body) except that in the former variables defined earlier in the list of variable-value pairs can be referred to in defining variables appearing later in the list. We explored the mapoperator in Chapter 13.

This is a simplified version of what’s done in practice. In practice, a distinction is typically made between individuals (referred to as phenotypes) and classes of individuals sharing the same genetic code (or genotype). For example, in a population of TSP tours, the same tour could appear multiple times. In designing a GA, you have to think not only about the fitness of a particular code but also about how often the code appears in the population. Having more than one individual with the same code gives some insurance that a particularly fit code won’t be lost due to random selection.

But we won’t worry about such fine points here. Let’s consider the tiny population corresponding to the five tours we’ve mentioned so far. Here’s the distribution corresponding to this population (technically, the distribution is the five numbers in the last column; these should sum to 1.0 but don’t because they’ve been rounded for readability to the nearest 0.001):

> (display-distribution (distribution initial-population))
(BOS PVD BWI SFO ORD DFW) - Length: 6969, Fitness: 0000, Probability: 0.000
(BOS PVD BWI ORD SFO DFW) - Length: 5852, Fitness: 1117, Probability: 0.311
(BOS BWI PVD DFW SFO ORD) - Length: 6379, Fitness: 0590, Probability: 0.164
(BOS BWI PVD ORD SFO DFW) - Length: 6408, Fitness: 0561, Probability: 0.156
(BOS PVD BWI DFW SFO ORD) - Length: 5648, Fitness: 1321, Probability: 0.368

How do we go about picking the next generation? A typical GA might choose a set of entities in accord with the above distribution. This means that, on average, there would be 31.1% entities with (BOS PVD BWI ORD SFO DFW) for a code, 16.4% entities with (BOS BWI PVD DFW SFO ORD) for a code, and so on. Before we show exactly how to accomplish this in implementing a genetic algorithm, it’s worth a short detour to explain how to simulate a process that makes selections in accord with a probability distribution.

Most programming languages support generating sequences of so-called random numbers. Roughly speaking, a sequence of numbers is said to be random if it’s impossible to predict with certainty the next number in the sequence. Since computer procedures for generating random numbers are just implementations of algorithms, they can’t, strictly speaking, generate random numbers — if you know the algorithm then, at least in principle, you can just run it to predict the next “random” number in the sequence. In recognition of this fact, the numbers generated by computer random-number generators are called pseudo-random numbers. With this caveat, the sequences of numbers generated by a well designed pseudo-random-number generator are about as random (that is to say, unpredictable) as you could want.

The Scheme invocation (random k) generates a (pseudo-) random number, an integer, in the range 0 to k - 1 according to a uniform distribution. This just means that any integer in the range is as likely to appear as any other. Here’s a little Scheme program that generates a sequence of m random numbers in the range 0 to k - 1 and keeps track of the frequencywith which each integer appears in the sequence. The histogramfor such a sequence lists the frequencies for all the numbers. I won’t explain the code in detail except to note that the function histogram-uniform uses the Scheme iteration construct (described in Chapter 5) to generate the random numbers and keeps track of frequencies using a vector of running counts.

(define (histogram-uniform m k)
  (do ((i 0 (+ i 1)) 
       (n (random k) (random k))
       (counts (make-vector k 0)))
      ((= i m) (display-histogram counts))
    (vector-set! counts n 
                 (+ 1 (vector-ref counts n)))))

For completeness, here’s the procedure that displays the frequency information, also implemented using the Scheme iteration construct:

(define (display-histogram vec)
  (do ((i 0 (+ i 1)))
      ((= i (vector-length vec)))
    (fprintf (current-output-port) 
             "~A occurs ~A times ~%"
             i (vector-ref vec i))))

Here we see how random works in generating random bits (0 or 1):

> (histogram-uniform 10 2)
0 occurs 4 times
1 occurs 6 times
> (histogram-uniform 100 2)
0 occurs 58 times
1 occurs 42 times
> (histogram-uniform 1000 2)
0 occurs 479 times
1 occurs 521 times

As expected, the number of 1s is not exactly equal to the number of 0s; however, as the lengths of the random sequences increase, the deviation from equal numbers of 1s and 0s drops precipitously, from 20% to 8% and then to 0.021%.

We can use random to make selections according to a distribution such as our GAs. First we use random to generate an approximation to a uniform distribution on the real numbers between 0.0 and 1.0. In this definition, we use 2147483647 = 231 - 1, the largest integer that random can handle:

(define (random-p)
  (/ (random 2147483647) 2147483647))

Now we’re ready to define a function that takes a distribution, represented as a list of numbers that sum to 1.0, and returns an index into a list (a list of tours in our case):

(define (random-select dist)
  (do ((p (random-p))
       (i 0 (+ i 1))
       (q (car dist)))
      ((<= p q) i)
    (set! q (+ q (list-ref dist (+ i 1))))))

To understand how random-select works, think about how it modifies q. Suppose that the single argument to random-select is (0.000 0.311 0.164 0.156 0.368), the distribution we produced from our initial population of tours (again, with the numbers rounded off). Then suppose that p is initialized to 0.465. In the first iteration, i is 0, q is 0.000 and the test (<= p q) evaluates to false. In the second iteration, i is 1, q is 0.311 and the test evaluates to false. In the third iteration, i is 2, q is 0.475 = 0.311 + 0.164, the test evaluates to true and random-select returns the index 2. Let’s test our selection function on our initial population of tours:

> (define initial-distribution (distribution initial-population))
> (list-ref initial-population (random-select initial-distribution))
(BOS PVD BWI DFW SFO ORD)

Well, that’s not very informative: a single randomly generated tour isn’t going to tell us much of anything. So to test random-select I wrote a variant of the histogram-uniform function that works with random-select. This function, called histogram-nonuniform, takes two arguments, an integer indicating the number of times to call random-select and a distribution, and then prints out the number of times each index is selected. This run of histogram-nonuniform shows how random-select approximates the distribution (0.000 0.311 0.164 0.156 0.368):

> (histogram-nonuniform 1000 initial-distribution)
0 occurs 0 times
1 occurs 312 times
2 occurs 162 times
3 occurs 165 times
4 occurs 361 times

So now we have the necessary machinery to select a population according to a distribution weighted to take into account our measure of fitness, shortness of tours. This is not the final population that will constitute the next generation; it’s simply the set of individuals that will be allowed to reproduce. In a GA these individuals would be paired up, individually subjected to mutation, pair-wise subjected to crossover, and perhaps modified by other genetic operators as well. The resulting mutations and sexual combinations would then comprise the next generation, perhaps with some culling (again with random variation) to produce a generation of the same size as the previous generation. That’s basically it: the GAs produce generation after generation of entities until they stumble on one we’re satisfied with.

Often enough, lessons from biology can point the way to improved GAs. The above distribution assigned zero probability to the first tour we looked at. It could be, however, that an otherwise uninteresting (low-fitness) entity has a nice solution to some small part of the overall problem that isn’t otherwise represented in the population. If that entity has no opportunity to reproduce, then that piece of the overall solution could be lost. There are lots of good reasons to maintain a diverse pool of individuals. Crossover operators can, at least occasionally, construct from two inferior entities an entity of high quality and high fitness.

A problem facing the designers of GAs as well as the designers of search algorithms concerns when to stop. We typically don’t know what the best hypothesis looks like, and so we have to guess when to stop. For our little problem involving six cities we can check by performing an exhaustive search of all possible tours:

> (display-optimal-tour (exhaustive-search cities))
(PVD ORD SFO DFW BWI BOS) - Length: 5530

We aren’t far off from an optimal solution, but, in general, there is no computationally tractable way for us to know this. Here there are only 6! = 6 * 5 * 4 * 3 * 2 * 1 = 720 possible tours. But the factorial function increases faster than any exponential function — compare 10! = 3628800 with 210 = 1024, or note that 100! has over 150 digits while 2100 has only 31.

You might think you could just keep track of statistics on how the entities in populations change over time. If the most fit entities in the last twenty populations have the same fitness within some small tolerance, then perhaps you can’t get any better. Unfortunately, GAs, like other search algorithms, tend to get stuck on “fitness plateaus” where the entities in populations are all just about the same in terms of fitness and in terms of the availability of useful variations corresponding to pieces of the sought-after solution. The entities in these populations have neither an incentive nor a basis to evolve. Sometimes the GA designer can borrow from nature to create an incentive and thus inject variation into a moribund population. You can create complex simulated environments and determine fitness by having entities fight for survival in them. Some GAs use the digital analogs of parasites, symbiotes, competition, predators, and catastrophic events to spur evolution and encourage novel solutions.

The six students in the GISP studied the literature, invented new algorithms and reinvented old ones, struggled to apply the general GA framework to particular problems, and wrote code and performed lots of computational experiments. They were elated at times and frustrated at others. Sometimes the way seemed so clear, and sometimes the mountains of books and relevant papers were pretty daunting. I think they certainly deepened their understanding of computer science. They discovered that genetic algorithms are not a panacea for hard computational problems. They learned that the metaphors and lessons from biology can be useful, but that other metaphors and mathematical approaches are just as useful. And they discovered that nothing replaces really understanding the computational problem that you’re trying to solve.

I can certainly understand why they were attracted to the general area and to GAs in particular. Whether you buy into the metaphors and methodology of GAs or chose some other approach to searching very large hypothesis spaces, some way of simulating Darwin’s dangerous idea is likely to prove useful in building intelligent machines. And keep in mind that simulated evolution is keeping pace with advances in computing hardware and accelerating exponentially. I’d be remiss not to point out that scientists studying natural evolution use computers to simulate their theories, thus providing interesting insights into biological systems. I especially like the accounts of such computer studies in Matt Ridley’s The Origins of Virtue (on understanding the evolutionary basis for altruism and cooperation) and The Red Queen (on understanding the advantages of sex in reproduction). But studying naturally occurring evolution with computer simulations is only one possible avenue of inquiry. What about simulating alternative universes? What about tinkering with the mechanisms of life and even devising new life forms?

Scientists have long been fascinated with the connection between computation and life. In the late 1940s, John von Neumanndevised a class of computations called cellular automatathat were sufficiently powerful to support self replication. In 1970, the mathematician John Conwaydesigned a class of cellular automata, now called “Conway’s Game of Life”, that has been the basis for numerous experiments and conjectures on the possibility of life forms that live entirely within the confines of computer memory.

The field of artificial lifeis based on the idea that programs can simulate various life forms; as computers become more powerful, experiments simulating artificial life have become more ambitious. An early pioneer in artificial life, Tom Ray, created a simulated world called Tierra in which to conduct experiments involving evolution and artificial life, and now experimenters the world over are tinkering with their own variations of Tierra as well as other simulated worlds. This may seem premature, since we don’t yet understand natural life forms, but simulations allow us to explore the implications of our theories and weed out those that don’t agree with experimental evidence.

So far, artificial life forms are largely confined to computers. However, there is nothing in principle stopping someone from using computers to accelerate the evolution of life forms and then producing bodies for the fittest individuals with computer-aided design and manufacturing. It’s already scary enough that programmers are tinkering with software life forms that can be set loose in the computer networks our lives and livelihoods depend on. As often, with every use of advanced technology that improves our lives come other uses that are threatening and worrisome.

Biology may suggest a way to deal with rogue programs in our computer systems. Perhaps operating systems of the future will look more like biological systems, with complex defenses and immune systems that adapt to such threats as computer viruses. In the future a computer or a network may “feel under the weather” as its immune system mounts a counterattack on a malicious virus or malevolent hackers. Perhaps operating systems will use a form of accelerated simulated evolution to create effective antibodies against computer attacks and cooperating computer systems will share antiviral vaccines to keep their networks healthy and running smoothly. Indeed, all these suggestions already exist in prototype and will likely become commonplace in the near future.

Darwin’s dangerous algorithm is indeed a mixed blessing. But once you think about the relationship between life and computation, the idea of simulating evolution is inevitable. And if nature is wondrous in its variation and fecundity, what more might we expect of artificial variants of evolution boosted by human imagination and accelerated by an exponentially increasing computational ability?


1 Louis Menand, in this passage from The Metaphysical Club (p. 122), makes it clear that natural selection is blind to any particular design or plan, and that the choice or selection of characteristics is determined entirely by reproductive success.

2 Actually, the number of tours depends on the structure of the underlying graph. If, however, we assume that a link connects each pair of nodes in the graph, then every ordering of these nodes constitutes a tour. You should be able to convince yourself that there are n! (n factorial) possible orderings of n nodes and furthermore that n! > 2n for n > 3.

3 If you know about genes you can think of them as tokens, but the analogy is a little rough. Genes are not atomic; rather, they can be broken down into sequences of codons that code for the twenty amino acids used to synthesize proteins. Codons can be further broken down into triples composed of the four chemical bases, adenine, cytosine, guanine and thymine, that comprise DNA. Most genetic algorithms make only superficial use of what is known about genetics and molecular biology.

4 This crossover example conveniently used two tours whose first and second halves contained the same cities, BOS, PVD and BWI in the first halves and DFW, ORD and SFO in the second. Clearly it’s possible to choose a pair of tours that doesn’t have this property. The crossover operator for TSP must be designed to avoid creating tours with multiple occurrences of the same city — an interesting design problem.