Yes, this is clearly nonsense. Sorting is not a hard problem and standard algorithms such as quicksort and mergesort have O(x^2) and O(n log(n)) complexity. But let me scratch this itch of sorting numbers using an evolutionary algorithm, specifically Covariance matrix adaptation evolution strategy (CMA-ES). Technically, we will use what I think is the original library by the inventor of the method, Nikolaus Hansen.

In python, we will make use of these two libraries:

1 2 | import cma # pip install cma import numpy as np # pip install numpy |

## Solving without constraints

CMA-ES like other metaheuristic uses the concept of a fitness function to search for good or optimal solutions to a problem. The algorithm does not need to know the structure of your problem as all knowledge is encapsulated in the fitness function. The algorithm generates candidate solutions and evaluates them with the fitness function. The fitness of solutions is used to generate the next batch of solutions until convergence (fitness = 0).

While you can define feasibility of solutions by providing constraints, this is not a requirement. Therefore, we first will try to solve the toy problem without constraints.

### Fitness function and initial solution

For CMA-ES to work, you must provide a fitness function that is used to evaluate solutions. For sorting, we define our fitness function as the euclidean distance between a solution *x* to the optimal solution *xopt*, which is a sorted list.

Clearly, it is nonsense that we must first have the optimal solution in order to define the fitness function. Why search if we already have the answer? But again, this is just a simple example chosen so we can focus on the method, not the application.

In addition to a fitness function, you must also provide a *seed* solution, which we will call *x0*. The algorithm will start from *x0* and search for better solutions using that as a starting point. Conceptually, a bunch of "neighbours" are evaluated in each step and the direction of search is determined by computing their fitness. Most metaheuristics will intensify search in promising neighbourhoods and ignore the less promising ones. You can read more about CMA-ES on Wikipedia.

Below we will setup the problem by defining the fitness function *ff* and an initial solution *x0*.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 | # Optimal solution (used in fitness function) n = 40 xopt = np.arange(n).astype(float) # [0, 1, ..., 38, 39] # fitness function, the euclidean distance x -> xopt ff = lambda x: np.linalg.norm(xopt-x) # Initial solution, a random permutation of the optimal solution x0 = np.random.permutation(xopt) # [26, 16, ..., 38, 12] # initial standard deviation sigma0 = 0.5 |

Now that we have defined the fitness function, we can forget that we ever knew the optimal solution. It is however embedded in the fitness function. For your problem, your would have some meaningful way of defining the fitness of a solution. Keep in mind that a value of *0* means perfect fitness, where larger values mean worse fitness.

### Optimise: using wrapper API

First, we can optimise using the wrapper functions provided by *cma*. For some reason, using these wrappers results in slower convergence and I don't know why. There are several ways to use the wrapper API. Below you see two different ways, which I believe are equivalent:

# method 1 es = cma.CMAEvolutionStrategy(x0, sigma0) es.optimize(ff) xbest = es.result.xbest # method 2 xbest, es = cma.fmin2(ff, x0, 0.5) print(xbest.round(0)) |

### Optimise: using stop-ask-tell

Next, we will solve the problem without the wrappers. The *cma* library uses a stop-ask-tell protocol.

- Stop: returns true if the algorithm has converged
- Ask: the algorithm returns the current pool of solutions
- Tell: the user provides a fitness value for each solution

While the following code is slightly longer, it converges faster for some reason. Again, I don't know why. It is however an equivalent way to solve the problem and also finds the optimal solution.

1 2 3 4 5 6 7 8 9 10 11 | es = cma.CMAEvolutionStrategy(x0, 0.5) fvals = [] # used for plotting later while not es.stop(): solutions = es.ask() fitness = np.array([ff(xopt, x) for x in solutions]) fvals.append(fitness.min()) es.tell(solutions, fitness) xbest = es.result_pretty().xbest.round(precision) print(xbest.round(0)) |

The program finds the solution in less than 1 second on my laptop (Mac Book Pro M2). Is that impressive? Well, it is to some degree. The solution space is essentially any combination of 40 real numbers, since we did not specify any constraints on the values. You can specify constraints in the CMA library, but that is a topic for another day.

## Adding constraints

You may provide constraints to CMA-ES as a vector-valued function, *g_i*, which defines a solution *x* as feasible if and only if *g_i(x) ≤ 0* for all *i*. The following code is based on the example notebook from the pycma website. The structure of the code is almost identical from what we had before. The only difference is that we now combine the old fitness and new constraint function into a new "special" fitness function that is used during optimisation.

For our problem of sorting numbers, we want to enforce the constraint that any number must be less than or equal to any numbers to the right of it. If you think about it, that is the same as saying the numbers must be sorted. This means that our initial solution, which is just a permutation of sorted numbers, will be infeasible with near 100% probability.

Modified example that adds a constraint:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | # Create constraint function def constraints(x): # x_i must be less than or equal to all x's to the right return [xi - x[i:].max() for i, xi in enumerate(x)] # Combine old fitness function and constraints # This is used in place of the old fitness function ffc = cma.ConstrainedFitnessAL(ff, constraints) es = cma.CMAEvolutionStrategy(x0, 0.5) while not es.stop(): solutions = es.ask() fitness = np.array([ffc(x) for x in solutions]) es.tell(solutions, fitness) xbest = es.result.xbest |

For this particular problem, adding the constraint seemingly does not help the problem converge faster. Maybe it already converges as fast as it can, and the constraint just adds overhead and an initial scramble for a feasible solution?

## Early stopping

Plotting the fitness value as it evolves over time, it is clear that we could have stopped earlier with a pretty good solution. Maybe a pretty good solution does not make sense for sorting, but it would make sense in many other scenarios, such as financial optimisation, where there is a significant amount of uncertainty.

Plot fitness over time:

1 2 3 4 5 6 7 | import matplotlib.pyplot as plt plt.plot(fvals) plt.xlabel('Time') plt.ylabel('Fitness') plt.title('Fitness over time') plt.show() |