How to sort numbers with an evolutionary algorithm (CMA-ES)

Yes, this is clearly nonsense. Sorting is not a hard problem and standard algorithms such as quicksort and mergesort have O(x^2) and O(n log(n)) complexity. But let me scratch this itch of sorting numbers using an evolutionary algorithm, specifically Covariance matrix adaptation evolution strategy (CMA-ES). Technically, we will use what I think is the original library by the inventor of the method, Nikolaus Hansen.

In python, we will make use of these two libraries:

1 2	import cma # pip install cma import numpy as np # pip install numpy

Solving without constraints

CMA-ES like other metaheuristic uses the concept of a fitness function to search for good or optimal solutions to a problem. The algorithm does not need to know the structure of your problem as all knowledge is encapsulated in the fitness function. The algorithm generates candidate solutions and evaluates them with the fitness function. The fitness of solutions is used to generate the next batch of solutions until convergence (fitness = 0).

While you can define feasibility of solutions by providing constraints, this is not a requirement. Therefore, we first will try to solve the toy problem without constraints.

Fitness function and initial solution

For CMA-ES to work, you must provide a fitness function that is used to evaluate solutions. For sorting, we define our fitness function as the euclidean distance between a solution x to the optimal solution xopt, which is a sorted list.

Clearly, it is nonsense that we must first have the optimal solution in order to define the fitness function. Why search if we already have the answer? But again, this is just a simple example chosen so we can focus on the method, not the application.

In addition to a fitness function, you must also provide a seed solution, which we will call x0. The algorithm will start from x0 and search for better solutions using that as a starting point. Conceptually, a bunch of "neighbours" are evaluated in each step and the direction of search is determined by computing their fitness. Most metaheuristics will intensify search in promising neighbourhoods and ignore the less promising ones. You can read more about CMA-ES on Wikipedia.

Below we will setup the problem by defining the fitness function ff and an initial solution x0.

# Optimal solution (used in fitness function)
n = 40
xopt = np.arange(n).astype(float)
# [0, 1, ..., 38, 39]
 
# fitness function, the euclidean distance x -> xopt
ff = lambda x: np.linalg.norm(xopt-x)
 
# Initial solution, a random permutation of the optimal solution
x0 = np.random.permutation(xopt)
# [26, 16, ..., 38, 12]
 
# initial standard deviation
sigma0 = 0.5

Now that we have defined the fitness function, we can forget that we ever knew the optimal solution. It is however embedded in the fitness function. For your problem, your would have some meaningful way of defining the fitness of a solution. Keep in mind that a value of 0 means perfect fitness, where larger values mean worse fitness.

Optimise: using wrapper API

First, we can optimise using the wrapper functions provided by cma. For some reason, using these wrappers results in slower convergence and I don't know why. There are several ways to use the wrapper API. Below you see two different ways, which I believe are equivalent:

# method 1
es = cma.CMAEvolutionStrategy(x0, sigma0)
es.optimize(ff)
xbest = es.result.xbest
 
# method 2
xbest, es = cma.fmin2(ff, x0, 0.5)
 
print(xbest.round(0))

Optimise: using stop-ask-tell

Next, we will solve the problem without the wrappers. The cma library uses a stop-ask-tell protocol.

Stop: returns true if the algorithm has converged
Ask: the algorithm returns the current pool of solutions
Tell: the user provides a fitness value for each solution

While the following code is slightly longer, it converges faster for some reason. Again, I don't know why. It is however an equivalent way to solve the problem and also finds the optimal solution.

es = cma.CMAEvolutionStrategy(x0, 0.5)
 
fvals = []  # used for plotting later
while not es.stop():
    solutions = es.ask()
    fitness = np.array([ff(xopt, x) for x in solutions])
    fvals.append(fitness.min())
    es.tell(solutions, fitness)
 
xbest = es.result_pretty().xbest.round(precision)
print(xbest.round(0))

The program finds the solution in less than 1 second on my laptop (Mac Book Pro M2). Is that impressive? Well, it is to some degree. The solution space is essentially any combination of 40 real numbers, since we did not specify any constraints on the values. You can specify constraints in the CMA library, but that is a topic for another day.

Adding constraints

You may provide constraints to CMA-ES as a vector-valued function, g_i, which defines a solution x as feasible if and only if g_i(x) ≤ 0 for all i. The following code is based on the example notebook from the pycma website. The structure of the code is almost identical from what we had before. The only difference is that we now combine the old fitness and new constraint function into a new "special" fitness function that is used during optimisation.

For our problem of sorting numbers, we want to enforce the constraint that any number must be less than or equal to any numbers to the right of it. If you think about it, that is the same as saying the numbers must be sorted. This means that our initial solution, which is just a permutation of sorted numbers, will be infeasible with near 100% probability.

Modified example that adds a constraint:

# Create constraint function
def constraints(x):
    # x_i must be less than or equal to all x's to the right
    return [xi - x[i:].max() for i, xi in enumerate(x)]
 
# Combine old fitness function and constraints
# This is used in place of the old fitness function
ffc = cma.ConstrainedFitnessAL(ff, constraints)
 
es = cma.CMAEvolutionStrategy(x0, 0.5)
 
while not es.stop():
    solutions = es.ask()
    fitness = np.array([ffc(x) for x in solutions])
    es.tell(solutions, fitness)
 
xbest = es.result.xbest

For this particular problem, adding the constraint seemingly does not help the problem converge faster. Maybe it already converges as fast as it can, and the constraint just adds overhead and an initial scramble for a feasible solution?

Early stopping

Plotting the fitness value as it evolves over time, it is clear that we could have stopped earlier with a pretty good solution. Maybe a pretty good solution does not make sense for sorting, but it would make sense in many other scenarios, such as financial optimisation, where there is a significant amount of uncertainty.

Plot fitness over time:

import matplotlib.pyplot as plt
 
plt.plot(fvals)
plt.xlabel('Time')
plt.ylabel('Fitness')
plt.title('Fitness over time')
plt.show()