The general procedure in a genetic algorithm is to work iteratively toward a solution to a problem. At any given iteration, a family of possible solutions is evaluated for their “fitness,” that is, how close they come to satisfying the conditions of the problem. The algorithm selects the fittest solutions and then recombines these solutions to produce a new generation of solutions. In this way a solution to the problem “evolves.”
References to genetic algorithms [Holland 1992], genetic programming [Koza 1992], and further general questions of evolution and order [Kauffman 1993] can be found at the conclusion of this notebook.
As an example, consider the ruler for
The fitness of this ruler is measured by comparing the number of marks in the ruler with the number of integers the ruler will measure. The number of marks in the ruler (counting 0) is just the Mathematica Length function. The set of all numbers that the ruler will measure is found with the defined function span. One way to find this set is to map the absolute value function across all differences generated by a given ruler list.
For the example r,
span[r]
The function fitness can now be defined as a function that will increase when either the number of marks in the ruler decreases or the number of integers measured by the ruler increases.
For the example given here, the fitness value is about 2.
A ruler containing marks at every point from 0 to 36 would have a fitness equal to 0. It is known that a sparse ruler with ten marks exists for
The example below shows a generation of five new rulers resulting from the combination of two parent rulers. The fitness function tells you which two to select.
Since the ruler labeled newBetaRuler[i] will have the best fitness value, a plot of these values will indicate how well the generations of rulers are evolving toward a solution.
The graph shows that the rulers are evolving in terms of increasing fitness. The evolution is obviously not monotonic, however. The best ruler in this set of generations is newBetaRuler[19].
The ruler is reasonably short.
The ruler will measure all but seven of the numbers between 0 and 36.
You can see from the fitness value that there is still considerable room for improvement. (Remember that you should be able to find fitness values up to 27.)
This fitness function seems to select quickly the rulers that will measure all or most of the numbers between 0 and a given value p. Here is a typical plot of 100 generations.
Here is a selection of eleven rulers from this set of generations. These will measure the complete set of numbers between 0 and 36.
The above list, with sequences of identical rulers, is typical. The evolutionary process goes through periods of stability and periods of fluctuation, as you can see by examining a plot of 1,000 generations.
To create an image of the rulers themselves, you first convert them to binary strings. Thus the ruler
You can use the Mathematica ListDensityPlot function to transform numeric lists into “actual” rulers.
For example, if you view the first 15 best rulers from a particular series using ListDensityPlot[15], you get the following picture.
This plot shows at a glance that the first ruler is
;
also, that the mark 24 is found on each of the second through the fifteenth
rulers. The marks on any other ruler in the set, and the persistence of
marks across several generations, is easily visualized.
If you turn this plot sideways, you can match each ruler to its fitness value.
For an entire set of 1,000 generations, you have the following picture.
The fitness values of the set of all rulers for a given p can be plotted as a “landscape” that shows how the fitness varies across a set of rulers. For example, when
The table also shows that for
there are multiple rulers with the maximum fitness value. The Position
function shows the location of these rulers. The list of locations can
be used as an index to show all the sparse rulers for
.
Running a new experiment for
(Reinitialize the functions rulerGeneration and newBetaRuler.)
You can show how the evolving rulers migrate into the fitness landscape. betaRulerListReduced produces an abbreviated list of the best rulers, removing successive identical rulers while maintaining the order in which the rulers have evolved.
The set sites list the horizontal values of the positions of the rulers, while sitePoints pairs each horizontal value with its corresponding fitness. siteLines traces the evolution of the rulers through the landscape.
Lightening the color of the landscape makes it easier to distinguish the evolutionary path.
Since the values of
rapidly become too large, the entire fitness landscape for large values
of p cannot be pictured. However, “sketches”
of the landscape can provide useful information. For example, every fitness
landscape will include the fitness value of the ruler containing all possible
marks. This is the “zero valley” located in the graph
above at the point 341 on the horizontal axis. There are also zones that
are relatively rich in high fitness values, such as the region from 442
to 502 on the horizontal axis. Instead of always beginning with the zero
generation rulers as
,
collections of rulers could be seeded in the most promising zones of the
landscape.
The numbers of offspring and the mutation rate could vary with the fitness
of the evolving rulers. If generation size and mutation rate were to vary
inversely with fitness, then the population should tend to evolve within
an increasingly smaller zone. The appropriate metric here is the Hamming
distance, which defines the distance between two binary strings of length
n as the number of positions in which the two strings are different.
Two strings could be a small Hamming distance apart and still not look
close together in a graphical representation of the fitness landscape.
For example, in the representation of the landscape above, the first fitness
measure is for the ruler
and the last measure is for the ruler
--the
rulers at opposite sides of the landscape diagram are only one unit apart
in Hamming Distance. This is a result of the way the Gray code generates
the list of rulers.
An alternative algorithm could contain a timing device that would periodically raise and lower the mutation rate. Such an algorithm could produce occasional bursts of mutant rulers that would wander into unexplored areas of the landscape. Finally, the fitness function itself might change, in which case the landscape would be different at different stages of evolution.
There are 68,719,476,736 (about 70 billion) possible rulers of length 36, and only one of minimal length 9. Where in hyperspace does this unique solution live? How long would it take to find it? How good are the 36 adjacent rulers? Can the genetic algorithm find this absolute maximum? How good are the 94,143,280 other rulers of length 9? How are the best rulers of length 10, 11, and 12 distributed through hyperspace?
Holland, John H. Adaptation in Natural and Artificial Systems: An Introductory Analysis With Applications to Biology, Control, and Artificial Intelligence. MIT Press, 1992.
Kauffman, Stuart A. The Origins of Order: Self-Organization and Selection in Evolution. Oxford Press, 1993.
Koza, John R. Genetic Programming: On the Programming of Computers by Means of Natural Selection (Complex Adaptive Systems). MIT Press, 1992.
Skiena, Steven S. The Algorithm Design Manual. TELOS-Springer Verlag, 1998.