Automatic programming for the lazy

You never know… a random walk may lead to serendipity.

Today’s xkcd comic is well-timed because just yesterday I sent out an announcement of the availability of my implementation of Cartesian Genetic Programming for ECJ, a Java-based evolutionary computing software framework. Genetic programming (GP) is a problem-solving technique in computer science that is inspired by evolution in biology. You start with a population of randomized computer programs and measure the “fitness” of each program. The fitness is a measurement of how well a program solves a particular problem. You can think of the program itself as the “gene”.

In this simulation of evolution, the best programs in the bunch are selected for “breeding” for the next generations. Breeding is done by exchanging pieces of genetic material – in this case, we exchange pieces of the computer programs themselves. Good programs have different parts that are useful, and these parts are combined or exchanged in new “child” programs that are even better than their parents. Then, every so often, random mutations are introduced into the programs to promote diversity. Good mutations survive to future generations, and bad ones die out.

Parent programs producing offspring by exchanging pieces of themselves. Credit: Michael Adam Lones, Enzyme Genetic Programming

This sounds like a completely random way to solve a problem, but it is surprisingly effective for many kinds of problems, such as learning mathematical expressions that can describe some data set, discovering winning game-playing strategies, making forecasts and predictions from data sets, learning decision trees to classify data, evolving emergent behavior, and optimization of complex systems. Of great interest to me in applying genetic programming is the emergence of unique and fascinating solutions that are not likely to be conceived by human minds.

Cartesian Genetic Programming (CGP), the genetic programming technique I implemented for ECJ, is a variation of genetic programming invented by Julian Miller. CGP uses simple lists of numbers to represent the programs (most GP implementations use some kind of explicit tree representation). It has some interesting benefits over traditional tree-based genetic programs, such as improved search performance, reduced overhead, and less “bloat” in generated programs. My implementation includes some sample problems: regression (fitting a bunch of data to an equation), classification (identifying a species of iris flower based on simple measurements of its parts, or predicting if a breast cancer tumor will be benign or malignant), and parity (counting the number of “on” bits in a binary string).

The iris classification problem is a classic machine learning problem, dating all the way back to 1936. You have a set of measurements taken from various kinds of iris flowers, and your task is to figure out which species it is: iris virginica, iris versicolor, or iris setosa.

From left to right: iris virginica, iris versicolor, and iris setosa.

Starting with randomized programs, one of my CGP tests evolved the following programs (expressions) which correctly classified about 95% of the irises:

virginica = nand (> (- (+ 1.7777395 (/ sepalwidth sepallength)) (if 0.0053305626 petalwidth -0.6896746)) (/ -0.8330147 (neg 1.6308627))) (- (* (+ (- (+ 1.7777395 (/ sepalwidth sepallength)) (- petallength petalwidth)) 1.7777395) (- petallength petalwidth)) petalwidth)

versicolor = * (nand (> (- (+ 1.7777395 (/ sepalwidth sepallength)) (if 0.0053305626 petalwidth -0.6896746)) (/ -0.8330147 (neg 1.6308627))) (- (* (+ (- (+ 1.7777395 (/ sepalwidth sepallength)) (- petallength petalwidth)) 1.7777395) (- petallength petalwidth)) petalwidth)) (- (+ 1.7777395 (/ sepalwidth sepallength)) (- petallength petalwidth))

setosa = - (+ 1.7777395 (/ sepalwidth sepallength)) (- petallength petalwidth)

I don’t expect you to be able to read and understand the expressions – they are in a format that isn’t easy to read! What’s more important is that I didn’t have to create them myself – the CGP algorithm discovered them for me using only the input data (iris measurements) and the fitness function (a measurement of how many irises were correctly identified).

CGP also had good results when I tested the Wisconsin breast cancer data set. This data set contains measurements taken from fine needle biopsies of suspicious breast lumps. Our task is to predict whether the lumps are benign or malignant using only the measurements.

Fine needle aspiration. A thin needle is used to sample material from a suspicious lump in a breast. The data set contains the microscopic measurements taken from the sampled material.

CGP evolved the following program that correctly diagnoses the tumors 95% of the time:

malignant = not (nor (+ (+ (* cellShapeUniformity 1.5756812) bareNuclei) (<= (> (= cellSizeUniformity 0.08695793) -1.9803793) normalNucleoli)) (nor (> (- (iflez (if blandChromatin mitoses 0.75769496) bareNuclei normalNucleoli) (* 1.97491 (+ cellSizeUniformity clumpThickness))) (> 0.08695793 singleEpiCellSize)) (nor (>= marginalAdhesion (not (- (= (> (or clumpThickness 1.5756812) (= cellSizeUniformity 0.08695793)) (+ cellSizeUniformity clumpThickness)) (* (* 1.97491 (+ cellSizeUniformity clumpThickness)) mitoses)))) (iflez (if blandChromatin mitoses 0.75769496) bareNuclei normalNucleoli))))

Again, using only a fitness function and some test data, we are able to evolve a highly accurate cancer diagnosis tool.

Seems too good to be true? Well, it can be. Overfitting is a big problem when evolving these kinds of classifiers. For example, if each of the malignant tumor patients happened to be wearing red shoes during the biopsy (and the shoe type was included in the data set), a machine might be inclined to think that wearing red shoes was what determined the diagnosis. So, the evolved classifier is going to be very sensitive to whatever data set you unleash it upon.

Oh and then there’s the whole “no free lunch” thing. But that’s a depressing topic for another time.

If you want to find out more about my CGP implementation, check out the documentation. If you want to give it a whirl yourself, grab the distribution (you need at least Java 1.5). Check out the ECJ project page for more info about the evolutionary software framework my CGP implementation uses.

10 responses to “Automatic programming for the lazy

  1. This is just what I needed to read right now.

    Awesome stuff. Do you do all this for fun, or is this something you initially did on the job?

    Regardless, it’s quite impressive.

    And don’t discount the red shoes — Maybe a preference for red is flagged by a genetic marker that has a high correlation with another genetic marker for breast cancer.

    So I ask: Is discounting the unexplainable or seemingly ridiculous always for the best? And can you develop a genetic algorithm to tell me the answer to my question? Muahahahah! 🙂

    Seriously though — VERY impressive.

    Question 2) Are genetic algorithms what is used to determine what measurements facial recognition systems should be taking? It seems like this is just a superior way to solve a problem; almost biology based.

    Question 3 (sorry for so many): Is there a “Kucinich Factor” here? I.E. When breeding, do the “genes”/programs sometimes pick the runt/undesirable of the “litter” to “breed” with, just as another way of fostering diversity?

    I call this the “Kucinich Factor”, because Jesus man, have you seen his wife? I don’t think she selected him based on his fitness rating. Or her function for evaluating fitness must be heavily weighted by political power 🙂

  2. This is very cool. Thanks for posting links to so much source material and the code. I’ve been very curious about genetic programming since I first heard of the concept in Emergence theory. I know some fantastically efficient algorithms have been developed from it. With the code you’ve posted I’ll be able to learn a little Java (which I have to do for some SOA apps at work) and learn some concrete details of how this is implemented.

    One question. Is is called Cartesian Genetic Programming because of something it does with sets, or (more likely) because its inferring its own algorithms?

  3. Found the answer this morning in Miller’s paper, Cartesian Genetic Programming:

    “CGP is Cartesian in the sense that the method considers a grid of nodes that are addressed in a Cartesian coordinate system.”

    Now I have to go look up what that means. : )

  4. I get it now. Cartesian coordinates allow for defining shapes, so you can write applications to identify irises. As opposed to Binary Genetic Programing, which only has application in solving boolean problems. Very cool.

    Sorry for thinking outloud on your comments thread. : )

  5. Ryan: No problem, I enjoy the thought process. 🙂

    I think the “Cartesian” aspect comes from the fact that CGP’s representation originated as a two-dimensional coordinate system. Each element of this 2D arrangement refers to a node in the implicit program tree that the 2D structure represents. My understanding is that it is more to do with the layout of the “gene” (the representation of the evolved program) than anything solution-specific. In subsequent research, Miller found that the 2D layout of the gene performs no better than the simpler 1D layout. My CGP implementation uses the simpler 1D layout because of this.

    BTW – while you are learning Java with this implementation, be sure to check out the general ECJ tutorials mentioned here:

    Those tutorials are a great first step towards understanding how ECJ works (ECJ is the underlying framework that my CGP code runs in).

  6. Clint: I do this all for fun. But I did have one occasion to apply this on the job: I took a bunch of trial case data and attempted to evolve a forecasting program that could predict whether a defendant was going to be found guilty or innocent based on defendant demographics and other case information. I had mixed success. One experiment resulted in 90% accuracy prediction but it was because the evolved program “discovered” that people with very many court appearances tended to be guilty of their crimes (possibly a “hindsight is 20/20” type of thing). When I removed the “appearance count” from the data set, I still managed to evolve a classifier that had 82% accuracy. I don’t remember which attributes were significant to this classifier. But here’s a sample GP, with accuracy of around 85%, evolved using a mix of attributes:

    ((num_defendant_appearances > 14) and ( not ( is (“defendant,p1”) or not
    (num_defendant_appearances > 28)) and ((num_defendant_appearances < 21) or is ("month,Jun")))) or (((num_other_sentencings < 9) or (((age < 32) or (num_defendant_appearances > 28)) and (( is (“defendant,p1”) or
    ((num_defendant_appearances > 70) or ((num_defendant_appearances > 14) or (age >
    25)))) or ((((num_defendant_appearances > 14) and ( not ((num_defendant_appearances
    > 49) or ((num_defendant_appearances > 70) or (age < 32))) and ((num_defendant_appearances < 21) or is ("month,Jun")))) or ( not ( not (num_other_cases > 8)) and not
    (num_defendant_appearances < 14))) or (num_other_sentencings < 9))))) and not (num_defendant_appearances < 14)) "Is discounting the unexplainable or seemingly ridiculous always for the best?" -- One can never know this answer in advance. But a rule of thumb that keeps appearing in the literature is: The more complicated the model, the more likely your solution will be overfitted. I think this is because evolutionary searches are greedy and will latch on ANYTHING that improves fitness. This is especially true if there is an error in your model. By "model" I'm referring to how you represent your problem, and how you measure the advantages of one solution over another. "Are genetic algorithms what is used to determine what measurements facial recognition systems should be taking? It seems like this is just a superior way to solve a problem; almost biology based." -- GAs have been used for this in the past. An example:

    And here’s one titled “A Genetic Algorithm Based Feature Selection Approach for 3D Face Recognition” which seems to talk about what you describe:

    “Is there a ‘Kucinich Factor’ here?” — Yes, some GAs explicitly include runts to promote diversity, because you never really know where good solutions will come from. You can think of the space of all solutions as a landscape full of hills and valleys. You want to find the highest peak but often you are stuck on smaller hills, unable to see the high peak in the distances. GAs try to scatter the search far and wide to improve the chances of finding the highest peak. And the highest peak might be surrounded by the deepest valleys, fooling any greedy searches that are stuck there. Whether Kucinich himself lives in a valley or a peak is a highly subjective matter. 🙂

  7. Cool — thanks for the answers!

  8. Ah! Thanks for clarifying the “Cartesian” part of this. That totally makes sense now.

    Your case-study evolving algorithms to predict whether a defendant will be found guilty is particularly fascinating. Thanks for introducing me to this concept of over-fitting, something to take into account in learning about machine learning.

    Thanks for the link! I’ve started toying with this in PHP, but the concept has advanced much further in JAVA.

  9. Ryan: I highly recommend Koza’s books which provide good introductions to genetic programming:

    I found a PDF version of his first GP book here:

  10. Pingback: » Blog Archive » Patterns in the PHP Random Function

Leave a Reply