MPIKAIA - Parallel Genetic Algorithm

If you are interested in using the genetic algorithm based FORTRAN-77 optimization subroutine PIKAIA, and the modeling fitness-function that you want to maximize is computationally intensive, you have come to the right place. Here you will find all that you need to run PIKAIA in parallel on the supercomputers at the National Center for Atmospheric Research.

MPIKAIA is a slightly modified version of the PIKAIA 1.2 code developed by Paul Charbonneau and Barry Knapp at the High Altitude Observatory. It uses MPI code to parallelize the "Full Generational Replacement" evolutionary strategy. This package comes with a sample fitness function (userff.f), which you should replace with your own function/model after you successfully compile and run the sample code. The code has been verified to compile and run properly on several NCAR supercomputers (frost, lightning, bluesky). It also will run on multi-processor Linux machines, if they have MPI installed and configured properly.

If you use MPIKAIA and publish the results, please consider citing the paper that describes this code (Metcalfe & Charbonneau, 2003, Journal of Computational Physics, vol. 185, pp. 176-193). A freely accessible preprint version of this article is available at arXiv:astro-ph/0208315.

Quick Start

Background

Splitting up the workload

MPI Code

Customization

Quick start

Download the code for MPIKAIA. It will unpack into a new directory called mpikaia/
Compile the code
cd mpikaia/src/
gmake

The Makefile should automatically determine which architecture you are compiling on, and set the options appropriately. The default build is mpikaia which will be placed in the mpikaia/ directory.
Run the code
cd mpikaia/
runjob.csh <seed_int>

The runjob.csh script should automatically determine which machine you are running on, and submit a test job through the local batch queue system (<seed_int> is an integer to seed the random number generator).

Background

In 1998 we began a project to adapt some computationally intensive models to interface with PIKAIA. On the fastest processors available at the time, a single model would run in about 45 seconds. We knew that the optimization would require ~10⁶ model evaluations, so it was clear that a serial version of PIKAIA would require many months to finish on a single processor. To tackle the problem on a reasonable timescale, we decided to incorporate the message passing routines of the MPI software into the "Full Generational Replacement" evolutionary strategy of PIKAIA. This software package allows a collection of networked computers to cooperate on a problem as if they were a single multi-processor parallel machine. All of the software and documentation is free. We had no trouble installing it on our Linux cluster and the sample programs that came with the distribution made it easy to learn and use. The trickiest part of the procedure was deciding how to split up the workload among the various computers.

Splitting up the workload

A genetic algorithm based fitting procedure quite naturally divides into two basic functions: evaluating the modeling fitness-function, and applying the genetic operators to each generation once the fitnesses have been calculated. When we profiled the distribution of execution time for each part of our code, this division became even more obvious. As with the vast majority of real-life applications, fitness evaluation is by far the most computationally demanding step. For our application, 93% of CPU time is spent carrying out fitness evaluation, 4% carrying out breeding and GA internal operations (such as mutation rate adjustment), and 3% for system and I/O. It thus seemed reasonable to allocate slave tasks to perform the model calculations, while a master task took care of the GA-related operations.

In addition to decomposing the function of the code, a further division based on the data was also possible. Fitness evaluation across the population is inherently a parallel process, since each model can be evaluated independently of the others. Moreover, it requires minimal transfer of information, since all that the user-supplied function requires is the n-dimensional floating-point array of parameters defining a single instance of the model, and all it needs to return is the floating-point value corresponding to the model's fitness. It is then natural to send one model to each available processor, so the number of machines available would control the number of models that could be calculated in parallel. Maximal use of each processor is then assured by choosing a population size that is an integer multiple of the number of available processors.

In practice, this recipe for dividing the workload between the available processors proved to be very scalable. Since very little data is exchanged between the master and slave tasks, our 64-node Linux cluster provided a speedup factor of about 53 over the performance on a single processor.

MPI Code

Dual-Function Front End
MPI requires the user to specify the number of available processors manually at the beginning of the run. It then spawns that many instances of the specified program: the first one acts as the master program, and all the rest run as slaves. This dual-functionality requires a front end program (mpi_pikaia.f) to determine whether a given instance is the master or the slave, to call the appropriate code as a subroutine, and to terminate all of the jobs when PIKAIA is done.

Master Subroutine
Starting with the source code for PIKAIA 1.2, we used the message passing routines of MPI to create a parallel fitness evaluation subroutine. The original code evaluates the fitnesses of the population one at a time in a DO loop. We replaced this procedure with a single call to a new subroutine that evaluates the fitnesses in parallel on all available processors. This parallel version of PIKAIA (pikaia_master.f) along with the parallel fitness evaluation subroutine (mpi_fitness.f) constitutes the master program.

After starting the slave program on every available processor, mpi_fitness.f sends an array containing scaled values of the parameters to each slave job over the network. In the first generation, these values are completely random; in subsequent generations, they are the result of the selection, crossover, and mutation of the previous generation, performed by the non-parallel portions of PIKAIA. Next, the subroutine listens for responses from the network and sends a new set of parameters to each slave job as it finishes the previous calculation. When every set of parameters in the generation have been assigned a fitness value, the subroutine returns to the main program to perform the genetic operations, resulting in a new generation of models to calculate. The process continues for a fixed number of generations, chosen to maximize the success rate of the search.

Slave Subroutine
To allow many instances of our code to run simultaneously, we added a front end that communicates with the master program through MPI routines. This code (ff_slave.f) combined with the user-defined fitness function (userff.f) constitutes the slave program, which runs on every available processor. The operation of each slave task is relatively simple. Once it is started, it waits to receive a set of parameters from the master task over the network. It then calls the fitness function (the sample fitness function twod in this distribution) with these parameters as arguments, and returns a fitness value to the master task.

Compiling and Running
This version of MPIKAIA has been compiled and tested under the MPI distributions available on NCAR supercomputers, and with MPICH 1.2.5 running on a multi-processor Linux system. You can compile the sample program using gmake with the provided Makefile. The code can then be run in parallel with the runjob.csh script, by issuing a command like: runjob.csh <seed_int> (the command line argument is an integer to seed the random number generator).

Customization

The beginning of pikaia_master.f contains some parameters that you may need to change when adapting MPIKAIA for use with your own function/model. In particular, you may need to adjust the number of parameters (n), and the values of the ctrl(1-12) array that affect the convergence properties of the genetic algorithm. Note that the value of ctrl(10) must be 1 for the code to run in parallel. All modifications to the original PIKAIA 1.2 code are indicated with comments that begin cMPI.

Production runs may also need to modify some of the parameters that control the batch queues in the runjob.csh script, like the number of processors to use and the estimated runtime. To run with maximum efficiency, the number of processors should be set to an integer fraction of ctrl(1), plus 1 for the master program.

Contact Travis Metcalfe with questions and comments about this page