Hemant Kumar
11 min readMar 27, 2021

Predicting Neural Network with the use of Genetic Algorithm

Now days, it’s relatively easy to train the neural networks, but it’s still difficult to figure out which network architectures and other hyperparameters to use — e.g., how many neurons, how many layers, and which activation functions? In the long term, of course, neural networks will learn how to architect themselves, without human intervention. Until then, the speed of developing application-optimized neural networks will remain limited by the time and expertise required to choose and improve hyperparameters. Our project is designed to help solve this problem, by rapidly returning good hyperparameters for datasets and classification problems. The code supports hyperparameter discovery for MLPs (i.e., fully connected networks) and convolutional neural networks. and uses the Keras library to build, train and validate.

If you had infinite time and infinite computing resources, you could brute-force the problem, and just compare all parameter combinations. However, in most real-world applications of neural networks, you will probably have to balance competing demands (time, cost, desire to continuously optimize AI performance in dynamic environments) and you may — for whatever reason — have a strong interest to be able to rapidly generate good networks for diverse datasets. In that case, genetic algorithms will be useful.

Here, we try to improve upon the brute force method by applying a genetic algorithm to evolve a network with the goal of achieving optimal hyperparameters in a fraction the time of a brute force search.

Introduction

Genetic algorithms are commonly used to generate high-quality solutions to optimization and search problems by relying on bio-inspired operators such as mutation, crossover and selection. — Wikipedia

At its core, a genetic algorithm…

1. Creates a population of (randomly generated) members.

2. Scores each member of the population based on some goal. This score is called a fitness function.

3. Selects and breeds the best members of the population to produce more like them.

4. Mutates some members randomly to attempt to find even better candidates.

5. Kills off the rest (survival of the fittest and all), and

6. Repeats from step 2. Each iteration through these steps is called a generation.

Repeat this process enough times and you should be left with the very best possible members of a population. Sounds like a lot of evolution, right? Same deal.

Genetic algorithms are a mysterious sounding technique in mysterious sounding field — artificial intelligence. This is the problem with naming things appropriately. When the field was labelled artificial intelligence, it meant using mathematics to artificially create the semblance of intelligence, but self-aggrandizing researchers and Isaac Asimov redefined it as robots [2].

The name genetic algorithms do sound complex and has a faintly magical ring to it, but it turns out that they are one of the simplest and most-intuitive concepts you will encounter in A.I.

Capability of Genetic Algorithms

In a word, genetic algorithms optimize. They can find better answers to a question, but not solve new questions. Given the definition of a car, they might create a better car, but they will never give you an airplane [2].

Defining a Problem to Optimize.

Now we are going to put together a simple example of using a genetic algorithm in Python. We are going to optimize a very simple problem: trying to create a list of N numbers that equal X when summed together.

If we set N = 5 and X = 200, then these would all be appropriate solutions.

lst = [40,40,40,40,40]lst = [50,50,50,25,25]lst = [200,0,0,0,0]

(I have been told that randomly stating Pretty easy, right? at random junctions of articles is not helpful. Otherwise, I would have uselessly added one here.)

Ingredients of The Solution

Each suggested solution for a genetic algorithm is referred to as an individual. In our current problem, each list of N numbers is an individual.

>>> from random import randint>>> def individual (length, min, max):...     'Create a member of the population.'...     return [ randint(min,max) for x in xrange(length) ]...>>> individual (5,0,100)[79, 0, 20, 47, 40]>>> individual (5,0,100)[64, 1, 25, 84, 87]

The collection of all individuals is referred to as our population.

>>> def population (count, length, min, max):...     """...     Create a number of individuals (i.e. a population).......     count: the number of individuals in the population...     length: the number of values per individual...     min: the min possible value in an individual's list of values...     max: the max possible value in an individual's list of values......     """...     return [ individual (length, min, max) for x in range(count)]...>>> population (3,5,0,100)[[51, 55, 73, 0, 80], [3, 47, 18, 65, 55], [17, 64, 77, 43, 48]]

Next we need a way to judge the how effective each solution is; to judge the fitness of each individual. Predictably enough, we call this the fitness function. For our problem, we want the fitness to be a function of the distance between the sum of an individual’s numbers and the target number X.

We can implement the fitness function as follows:

>>> from operator import add.>>> def fitness (individual, target):...     """...     Determine the fitness of an individual. Lower is better.......     individual: the individual to evaluate....     target: the sum of numbers that individuals are aiming for...     """...     sum = reduce (add, individual, 0)...     return abs(target-sum)...>>> x = individual(5,0,100)>>> fitness(x, 200)165

Personally, I’d prefer to have a high fitness score correlate to a fit individual rather than the current implementation where a perfectly fit individual has a fitness of 0, and the higher the worse. Ah well, regardless, keep that detail in mind while following this code.

It’s also helpful to create a function that will determine a population’s average fitness.

>>> def grade (pop, target):...     'Find average fitness for a population.'...     summed = reduce(add, (fitness(x, target) for x in pop), 0)...     return summed / (len(pop) * 1.0)...>>> x = population (3,5,0,100)>>> target = 200>>> grade(x, target)116

Now we just need a way evolve our population; to advance the population from one generation to the next.

Evolution

This is the secret sauce of genetic algorithms, where secret means obvious, and sauce means sauce. Consider a population of elk which are ruthlessly hunted by a pack of wolves. With each generation the weakest are eaten by the wolves, and then the strongest elk reproduce and have children. Abstract those ideas a bit, and we can implement the evolution mechanism.

1. For each generation we will take a portion of the best performing individuals as judged by our fitness function. These high performers will be the parents of the next generation.

We will also randomly select some lesser performing individuals to be parents because we want to promote genetic diversity. Abandoning the metaphor, one of the dangers of optimization algorithms is getting stuck at a local maximum and consequently being unable to find the real maximum. By including some individuals who are not performing as well, we decrease our likelihood of getting stuck.

2. Breed together parents to repopulate the population to its desired size (if you take the top 20 individuals in a population of 100, then you would need to create 80 new children via breeding). In our case, breeding is basic: take the first N/2 digits from the father and the last N/2 digits from the mother.

3.     >>> father = [1,2,3,4,5,6]4.     >>> mother = [10,20,30,40,50,60]5.     >>> child = father[:3] + mother[3:]6.     >>> child7.     [1,2,3,40,50,60]

It’s okay to have one parent breed multiple times, but one parent should never be both the father and mother of a child.

8. Merge together the parents and children to constitute the next generation’s population.

9. Finally, we mutate a small random portion of the population. What this means is to have a probability of randomly modifying each individual.

10.  >>> from random import random, randint11.  >>> chance_to_mutate = 0.0112.  >>> for i in population:13.  ...     if chance_to_mutate > random():14.  ...         place_to_modify = randint(0,len(i))15.  ...         i[place_to_modify] = randint(min(i), max(i))16.  ...

This — just like taking individuals who are not performing particularly well — is to encourage genetic diversity, i.e. avoid getting stuck at local maxima.

Putting it all together, the code to evolve a generation can be implemented like this:

def evolve(pop, target, retain=0.2, random_select=0.05, mutate=0.01):graded = [ (fitness(x, target), x) for x in pop]graded = [ x[1] for x in sorted(graded)]retain_length = int(len(graded)*retain)parents = graded[:retain_length]# randomly add other individuals to promote genetic diversityfor individual in graded[retain_length:]:if random_select > random():parents.append(individual)# mutate some individualsfor individual in parents:if mutate > random():pos_to_mutate = randint(0, len(individual)-1)# this mutation is not ideal, because it# restricts the range of possible values,# but the function is unaware of the min/max# values used to create the individuals,individual[pos_to_mutate] = randint(min(individual), max(individual))# crossover parents to create childrenparents_length = len(parents)desired_length = len(pop) - parents_lengthchildren = []while len(children) < desired_length:male = randint(0, parents_length-1)female = randint(0, parents_length-1)if male != female:male = parents[male]female = parents[female]half = len(male) / 2child = male[:half] + female[half:]children.append(child)parents.extend(children)return parents

Now we’ve written all the pieces of a genetic algorithm, and we just have to try it out and see if it works.

Testing It Out

Here is a simple way to use the code we have written:

>>> target = 371>>> p_count = 100>>> i_length = 5>>> i_min = 0>>> i_max = 100>>> p = population(p_count, i_length, i_min, i_max)>>> fitness_history = [grade(p, target),]>>> for i in xrange(100):...     p = evolve(p, target)...     fitness_history.append(grade(p, target))...>>> for datum in fitness history:...    print datum...

Running that code, you will get to watch as generations’ fitness gradually (but non-deterministically) approach zero. The output of one of my runs looked like this:

['76.06', '32.13', '26.34', '18.32', '15.08', '11.69', '14.05', '9.460', '4.950', '0.0', '0.0', '0.0', '0.0', '0.0', '0.800', '0.0', '0.239', '0.780', '0.0', '0.0', '0.0', '0.0', '1.48', '0.0', '0.0', '0.0', '0.0', '0.0', '0.0', '0.149', '0.239', '0.12', '0.0', '0.0', '0.0', '0.0', '0.0', '0.0', '0.0', '0.149', '0.0', '0.0', '0.0', '0.0', '0.0', '0.0', '4.200', '0.0', '2.049', '0.0', '0.200', '0.080', '0.0', '1.360', '0.0', '0.0', '0.0', '0.0', '1.399', '0.0', '0.0', '0.149', '1.389', '1.24', '0.0', '0.16', '0.0', '0.680', '0.0', '0.0', '1.78', '1.05', '0.0', '0.0', '0.0', '0.0', '1.860', '4.080', '3.009', '0.140', '0.0', '0.38', '0.0', '0.0', '0.0', '0.0', '0.0', '2.189', '0.0', '0.0', '3.200', '1.919', '0.0', '0.0', '4.950', '0.0', '0.0', '0.0', '0.0', '0.0', '0.0']

With 20% survival (plus an additional 5% of other individuals) and 1% mutation, it only took nine generations to reach a perfect solution. Then the algorithm joyfully runs in circles for as long as you’ll let the mutations continue but this is a good feeling. If it only took us half an hour to solve a problem of this magnitude, imagine what we could do with a day. A genetic algorithm for optimizing your Apache2 configuration file for number of children processes? Easy as pie.

There are a nearly endless variety of techniques for and variations of genetic algorithms, but all of them rest on this straightforward foundation. We will look more at those sometime in the future, but for now you know enough to go out and throw together something interesting.

Applying genetic algorithms to Neural Networks.

We will attempt to evolve a fully connected network (MLP). Our goal is to find the best parameters for an image classification task.

We will tune four parameters:

· Number of layers (or the network depth)

· Neurons per layer (or the network width)

· Dense layer activation function

· Network optimizer

The steps we will take to evolve the network, like those described above, are:

1. Initialize N random networks to create our population.

2. Score each network. This takes some time: We will train the weights of each network and then see how well it performs at classifying the test set. Since this will be an image classification task, we will use classification accuracy as our fitness function.

3. Sort all the networks in our population by score (accuracy). We will keep some percentage of the top networks to become part of the next generation and to breed children.

4. We will also randomly keep a few of the non-top networks. This helps find potentially lucky combinations between worse-performers and top performers and helps keep us from getting stuck in a local maximum.

5. Now that we have decided which networks to keep, we randomly mutate some of the parameters on some of the networks.

6. Here comes the fun part: Let us say we started with a population of 20 networks, we kept the top 25% (5 nets), randomly kept 3 more loser networks, and mutated a few of them. We let the other 12 networks die. To keep our population at 20 networks, we need to fill 12 open spots. It is time to breed!

Breeding

Breeding is where we take two members of a population and generate one or more child, where that child represents a combination of its parents.

In our neural network case, each child is a combination of a random assortment of parameters from its parents. For instance, one child might have the same number of layers as its mother and the rest of its parameters from its father. A second child of the same parents may have the opposite. You can see how this mirrors real-world biology and how it can lead to an optimized network quickly.

Dataset

We will use the relatively simple but not easy CIFAR10 dataset for our experiment. This dataset gives us a big enough challenge that most parameters will not do well, while also being easy enough for an MLP to get a decent accuracy score.

Note: A convolutional neural network is certainly the better choice for a 10-class image classification problem like CIFAR10. But a fully connected network will do just fine for illustrating the effectiveness of using a genetic algorithm for hyperparameter tuning.

Comparison with Traditional approach

Let us say it takes five minutes to train and evaluate a network on your dataset. And let us say we have four parameters with five possible settings each. To try them all would take (5**4) * 5 minutes, or 3,125 minutes, or about 52 hours [1].

Now let us say we use a genetic algorithm to evolve 10 generations with a population of 20 (more on what this means below), with a plan to keep the top 25% plus a few more, so ~8 per generation. This means that in our first generation we score 20 networks (20 * 5 = 100 minutes). Every generation after that only requires around 12 runs, since we do not have the score the ones we keep. That is 100 + (9 generations * 5 minutes * 12 networks) = 640 minutes, or 11 hours.

We have just reduced our parameter tuning time by almost 80%! That is, assuming it finds the best parameters…

Results

Brute force algorithm: We will start by running the brute force algorithm to find the best network. That is, we will train and test every possible combination of network parameter we provided.

Our top result using brute force achieved 56.03% accuracy with the following parameters:

· Layers: 2

· Neurons: 768

· Activation: relu

· Optimizer: adamax

This took 63 hours to run on Google Colab.

Genetic algorithm: Now we will run our genetic algorithm, starting with a population of 20 randomly initialized networks, and we will run it for 10 generations.

56.56% accuracy and Time= Seven hours. And the resulting network was Almost identical:

· Layers: 2

· Neurons: 512

· Activation: elu

· Optimizer: adamax

The only difference is the genetic algorithm preferred 512 to 768 neurons. (In the brute force run, the 512-network achieved 55.65%. Should have set a random seed.)

The genetic algorithm gave us the same result in 1/9th the time Seven hours instead of 63. And it is likely that as the parameter complexity increases, the genetic algorithm provides exponential speed benefit.

Prediction on MNIST Dataset with Various Populations
Prediction on CIFAR dataset with various Populations