Genetic Programming is an awesome way to tackle machine learning problems


I don’t know how I missed out on genetic programming. I’m still trying to pick my jaw up off the ground. It reminds me of The Sopranos when Paulie walks into a Starbucks clone and asks “how did we miss out on this?”

With all honesty, I have no idea exactly how all of this works. The foundational premise makes total sense, but why it works is hard to say exactly. And yet, it works. Neural networks are kind of the same. Most data scientists don’t have any idea how deep learning works. They do understand the underlying premise. It is a bit like the Wizard of Oz and not knowing what is behind the┬ácurtain. We all use computers every day and log onto the internet, etc. but very few of us understand how all of that technology comes together.

What is genetic programming?

One of the central challenges of computer science is to get a computer to do what needs to be done, without telling it how to do it. Genetic programming addresses this challenge by providing a method for automatically creating a working computer program from a high-level problem statement of the problem. Genetic programming achieves this goal of automatic programming (also sometimes called program synthesis or program induction) by genetically breeding a population of computer programs using the principles of Darwinian natural selection and biologically inspired operations. The operations include reproduction, crossover (sexual recombination), mutation, and architecture-altering operations patterned after gene duplication and gene deletion in nature.

Genetic programming is a domain-independent method that genetically breeds a population of computer programs to solve a problem. Specifically, genetic programming iteratively transforms a population of computer programs into a new generation of programs by applying analogs of naturally occurring genetic operations. The genetic operations include crossover (sexual recombination), mutation, reproduction, gene duplication, and gene deletion.

I’m going to be adding code to github with some examples of genetic programming in the future. I’ll post as I learn. I can tell you it works amazingly well. See my results below on the earthquakes competition on Kaggle, which used genetic programming. I attempted to also stack gplearn (a scikit learn package) with XGboost but it wasn’t as good as genetic programming on its own. I’ll be tweaking the features and also posting information that as well.