Simulated Annealing Optimization

First of all, as a warning, if you want to read a good blog about QuantLib you should probably go to other places like this one http://implementingquantlib.blogspot.de. However I thought it could be fun to relax from coding all the time by also writing a bit about it. For this first post I take one of the “Erlk├Ânige” from my presentation at the QuantLib User Meeting 2014 http://quantlib.org/qlws14.shtml (hey, why am I not on the picture ?), namely a new optmizer class https://github.com/lballabio/quantlib/blob/master/QuantLib/ql/experimental/math/simulatedannealing.hpp. (If you don’t know what an Erlk├Ânig is, look at slide two of my presentation.) But let’s start with defining a target function. This is done by deriving from CostFunction and implementing (at least) the value and values methods (which are quite similar in our one dimensional example here)

    class target : public CostFunction {
      public:
        target() {}
        Real value(const Array &a) const {
            Real x = a[0];
            return x * x / 100.0 + 0.3 * sin(x) * sin(x);
        }
        Disposable<Array> values(const Array &x) const {
            Array res(1);
            res[0] = value(x);
            return res;
        }
    } aTarget;

This function looks like this (you should click on the image for a better view)

simann_target

It has many local minima, e.g. some near -6, -3, +3, +6 and one, unique global minimum at 0. Many classic optimizers wouldn’t find this global minimum unless you give them a good start value which lies nearby the solution. Our new experimental global optimizer in QuantLib follows Chapter 10.9 in Numerical Recipes in C. This is based on the Nelder Mead algorithm (aka Simplex in the library, not to be confused with the linear programming thing). This adds some exponential distributed noise to the target function which does the trick of finding the global minimum with probability one when applied carefully.

But let’s start with the deterministic version of the optimizer. You can create the optimizer with

SimulatedAnnealing<> sa(0.2,t0,decay,steps);

where I set t0=0.0, decay=0.0 and steps=1 (which just means no noise). The first parameter is the Simplex lambda used to construct the vertices of the simplex which in general are the start value of the optimization (see below), this value plus (1,0,0,…) times the lambda, this value plus (0,1,0,…) times lambda and so on. Since we have a one dimensional problem our simplex consists of two vertices only.

I start the optimization at x=4.8, a bad start value as we will see soon.

    Array initial(1);
    initial[0] = 4.8;

I put no constraint on the problem

    NoConstraint nc;

allow for a lot of iterations (1000000, that’s 1 million, yes !), require 500 stationary state iterations before assuming convergence and set the epsilon for root, function value and gradient to 1E-8

    EndCriteria ec(1000000,500,1E-8,1E-8,1E-8);

As a side note, it is up to the actual optimizer what of these criteria (if any at all !) are used. The simulated annealing optimizer checks for max and stationary state iterations by comparing the simplex size with the root epsilon and ignores function and gradient epsilon. As another example, the Levenberg Marquardt implementation takes not less than three tolerance parameters in its own constructor while only using max iterations and function epsilon from the end criteria and ignoring stationary state iterations and the other two tolerances.

Now create a problem out of the target function, the constraint and the start value

    Problem p(aTarget,nc,initial);

and we are ready to run the optimization

    EndCriteria::Type ret = sa.minimize(p,ec);

You can retrieve the best found value for the independent variable by p.currentValue() and the function value by p.functionValue(). The variable ret contains the status of the optimization which can be one of the following enum values defined in EndCriteria

        enum Type {None,
                   MaxIterations,
                   StationaryPoint,
                   StationaryFunctionValue,
                   StationaryFunctionAccuracy,
                   ZeroGradientNorm,
                   Unknown};

self explaining in a way, aren’t they.

We visualize the optimization run by plotting one (the first) vertex of the simplex (plotting both wouldn’t be instructive) for each step in the optimization.

simann_det

I only plot the first 25 steps of the optimization because after that it gets boring (you see the pattern, don’t you). As I said the starting value was bad, just a bit left from the top of the second hill to the right from the global minimum. Function values are colored here, the minimum (at 0) is black, the next best local minimum blue (near +-3), the next purple (near +-6). The algorithm (the green dots) walks over the top of the hill and then runs into the valley right from +6, there converging nicely, but to a local minimum only.

The green dots lie exactly on the surface of the target function, indicating that the function value without any add on is used for optimization.

Now the idea of simulated annealing comes into play. Instad of zero noise we start with a temperature of 0.05. This means “noise” is added to the target function value during optimization. The noise is defined to be expoentially distributed with parameter 1 / temperature, i.e. with expectation and standard deviation equal to the temperature.

This lifts our green dots from the surface in a random way and in particular helps them get out of valleys that might turn out to be local ones only. During the optimization the temperature is decreased, converging to zero, so that the deeper valleys become visible to the algorighm while the more shallow ones are still filled by the noise. If we start with a temperature high enough and decrease it slow enough we can hope to find the global minimum this way. Of course the art here is to find a good scheme for the cooling and balancing this scheme against the number of necessary function evaluations.

Back to our first try with starting temperature 0.05. We choose to decrease this temperature by a factor of 1 – decay, decay = 0.0001 every step (step = 1). We plot the first 2000 steps of the algorithm’s run (wow, note how many steps are performed compared to the deterministic case).

simann_1

After around 500 steps thanks to the noise the dots jump over to the main valley and stay there, thus finding the global minimum ! The light blue grid floating over the target function surface visualizes the expected noise added to the original target function.

I should mention here how you can explicity specify the random number generator to generate the noise (it has to be one with a next() method giving a U(0,1) random variable).

    SimulatedAnnealing<> sa(0.2,t0,decay,steps,MersenneTwisterUniformRng(42));

would choose the Mersenne twister rng (which is also the default if nothing is given), with seed 42. The default seed is 0 meaning that the system timer is used for initialization, which is bad if you want to reproduce results.

Let’s do the same again, but now cooling down the whole thing a bit faster (decay = 0.0003). The result is as follows

simann_2

Oops. The dots jump over again, but only the the next valley, only giving a local minimum again. We cooled down too fast to find the global minimum in this case.

In practice one would probably start with a much higher temperature, like 0.25. With that and the same decay rate of 0.0001 per step we get

simann_3

We are converging to the global minimum again ! However we already have 30000 steps plotted here and we need even more to actually converge (like 400000 with these parameters !). We should surely fine tune our parameters before using them in a real life application.

Looking at the same picture more from a bird’s view we see the structure more clearly

simann_4

While all 5 valleys are covered in the first few hundred steps, after a while only the three inner ones are visited and finally the algorithm stays in the inner, best valley.

That’s all for today, it was fun to write this. Anyone read until here ?

Appendix. Here is the full code example, if you want to try it by yourself.

#include <ql/quantlib.hpp>

using namespace QuantLib;

int main(int, char *[]) {

    class target : public CostFunction {
      public:
        target() {}
        Real value(const Array &a) const {
            Real x = a[0];
            return x * x / 100.0 + 0.3 * sin(x) * sin(x);
        }
        Disposable<Array> values(const Array &x) const {
            Array res(1);
            res[0] = value(x);
            return res;
        }
    };

    target aTarget;

    double t0 = 0.25;
    double dec = 0.0001;
    int steps = 1;

    SimulatedAnnealing<> sa(0.2, t0, dec, steps, MersenneTwisterUniformRng(42));

    Array initial(1);
    initial[0] = 4.8;

    NoConstraint nc;
    EndCriteria ec(1000000, 500, 1E-8, 1E-8, 1E-8);

    Problem p(aTarget, nc, initial);

    EndCriteria::Type ret = sa.minimize(p, ec);

    std::cout << "found x=" << p.currentValue()[0]
              << " / y=" << p.functionValue() << " with ret code " << ret
              << std::endl;

    return 0;
}

The output running this should look like this

found x=-7.10543e-14 / y=1.5651e-27 with ret code StationaryPoint
Advertisements
Simulated Annealing Optimization