Welcome back. I hope you all had some nice Christmas days. I surely had. Still some time to quantlib a bit. In my last post (https://quantlib.wordpress.com/2014/12/20/adjoint-greeks/) I wrote about automatic differentiation (AD) and applied it to delta and vega computation in the Black76 model. Since the results were promising I started to work more seriously on my adjoint branch (https://github.com/pcaspers/quantlib/tree/adjoint). It is still in a proof of concept state. The first milestone would be an engine that computes adjoint interest rate deltas for QuantLib::Swap instruments. And guess what, I am almost there. It is hell of a lot of work though and if someone is interested to help, please contact me. You will get intimate with large parts of the QuantLib code and get your hands dirty at the same time with a specific goal to reach.

The essential work is to replace all quantities w.r.t. which we potentially want sensitivities by a generic type T. Then we can plug in a double if we want to work in classic mode or the active type of an AD framework like the one we already know from the CppAD library

CppAD::AD<double>


in order to leverage automatic differentiation in QuantLib. By the way CppAD’s author Brad Bell has replaced the error function with the one from the std:: namespace (it has become part of the C++11 standard). He helped me also with some cmake problems, very nice. At the moment I only work with CppAD, but the changes in the library are not specific to this framework, so other tools (at least those relying on operator overloading) could be used as well. An important point to keep in mind, don’t make yourself dependent on a single third party library. It is enough that we are all dependent on QuantLib, isn’t it. Anyway, all the CppAD specific things I added are currently collected in one, small, single header file ql/qlcppad.hpp.

To get a flavour of what it means to template’ize QuantLib, look at the example of a simple quote. The public part now looks as follows

template <class T = Real> class SimpleQuote_t :
public Quote_t<T> {
public:
SimpleQuote_t(T value = Null<Real>());
T value() const;
bool isValid() const;
T setValue(T value = Null<Real>());
void reset();
...


The templated version of the class is market by an underscore t. Actually the old class does not exist any more, but is rather retrieved by a typedef

typedef SimpleQuote_t<Real> SimpleQuote;
...


to keep existing code compiling without any changes (well, not quite in all cases, unfortunately, but see below for this). On another hand you can now instantiate a simple quote using an AD type with another typedef

typedef SimpleQuote_t<CppAD::AD<double>> SimpleQuoteAD;
...


We can then put this T – simple quote into a T – rate helper and create a T – piecewise yield curve, which will bootstrap the curve using a T – brent solver so that you can finally retrieve zero rates as AD doubles (using a T – linear interpolation, not to forget) and at the same time the sensitivity of zero rates to the input market quotes.

This is what I am trying to demonstrate today. It is not exactly the milestone example from above, but technically very close. The many T’s from above already suggest that along the way you have to adapt a lot of code. And this is not always funny. If you have an error in your templated code, the compiler will often just vomit a big flood of long, unreadable error messages over your compiler window. If you are lucky, at some secret point in that ugly thick soup there is a tender hint of what might actually be the true source of the problem. Then once the code compiles with double‘s this does not at all mean that it does so with AD<double>‘s.

What is more is that we have a lot of special things in QuantLib, like the Null type which for double‘s is implicitly converted to a certain value (namely std::numeric_limits<float>::max) to mark invalid values. This has to be made consistent with an AD double null type. Taking this example we have to introduce a Null<CppAD::AD<double>> which starts not too difficult with

template <class Base> class Null<CppAD::AD<Base> > {
public:
Null() {}
}
...


but since inside CppAD conversion from any type T to CppAD::AD<Base> (replace Base by double if you want) is done like this

template <class Base>
template <class T>
{	return *this = Base(t); }


i.e. via the Base type, we need also a conversion from our new Null type above to Base, which can be done as follows

..
// this is needed, because in ad_assign.hpp line 124ff
// assignment from T to AD<Base> is done via conversion from T
// to Base and then to AD<Base>. If for example
// to double so that then conversion to AD<double> from this
// works.
operator Base() const { return static_cast<Base>(Null<Base>()); }
...


and which does the trick in the end.

Some structural problems arise, because when templating you effectively turn parts of the library into a header-only version, that is, the implementation moves from the cpp to the hpp files. This is because the templated code can only be instantiated when compiling actual client code. Or you instantiate a list of specific types in the library itself, but then you tie yourself to exactly these implementations (e.g. support for CppAD in our context).

At least for iborcoupon.hpp, cmscoupon.hpp and couponpricer.hpp I arrived at circle references, which were not solvable by forward declarations. So I had to split these files into two parts each, one base part, that is included internally at some places in the library instead of the whole thing and a second part with the same name as the whole thing, which in turn includes the base part, to stay backward compatible.

Another source of many headaches was the already highly generic territory around the piecewise yield curve. I really admire this part of the library, it is beautiful and works incredibly robust. But it is challenging to adapt this piece of art to enable AD. In the end I still wanted to write something like


PiecewiseYieldCurve<ZeroYield,
Linear,
IterativeBootstrap,
instruments,
Actual365Fixed());
...


to initialize a curve. The only change is the last template parameter, which can be set optionally to an AD double type (if not given, it is just assumed double, so replicating the old behaviour).

One real non backward consistent change I introduced here is that factory classes for interpolations are now templated. The expression Linear therefore has a different meaning than before, namely taking a template type T that should be used for the calculations. The pro is that in generic classes like the piecewise yield curve we can just still write Linear and behind the scenes the type T (taken from the last template parameter of PiecewiseYieldCurve) is plugged into the factory template so that it can be done what has to be done. The con is that if the factory is explicitly used to create an interpolation object, we have to write

Linear<Real>


now instead of Linear alone. But assuming that the factory classes are mostly used as template parameters and not to actually instantiate classes in client code, this is hopefully not a huge change. Well, let’s see what Luigi has to say about this … 😉

Here is the new piecewise yield curve declaration, compared to the original version now an even nicer example of what you can do with templates in C++ …

template <template <class> class Traits, template <class> class Interpolator,
template <class, class> class Bootstrap = IterativeBootstrap,
class T = Real>
class PiecewiseYieldCurve
: public Traits<T>::template curve<Interpolator>::type,
public LazyObject {
...


Not only the Interpolator, but also the Traits (which are the Discount, Forward, ZeroYield kind of specifications) and the Boostrap class (IterativeBootstrap or LocalBootstrap) now have a new template parameter.

But to summarize, from outside everything looks and is used as before. If you want to use an AD double type you just have to add a new template parameter to the PiecewiseYieldCurve construction, that’s all.

Another limitation I came across is that there are not templated typedefs in C++. So for example the RateHelper typedef changes to

template <class T> struct RateHelper_t {
typedef BootstrapHelper<YieldTermStructure_t<T>, T> Type;
};


and the final rate helper types (classic and AD’ized versions respectively) are retrieved via

typedef RateHelper_t<Real>::Type RateHelper;
...


i.e. with an additional ::Type suffix.

Regarding templated functions, the type can generally be deduced from the function parameters, so that we can still write code like

Real z = 1.0;
if (close(z, 0.0)) {}


(an exceptionally dumb code example), although the declaration of close has changed to

template<class T> bool close(T x, T y);


However this is not totally backward compatible, because the expression close(z,0) was legal code before and now the compiler goes “hey, I need to deduce a type T here and you are giving me an int and a double, I am confused, I can not work like this, please focus a bit on what you are doing, will you”. This was silently resolved before by an implicit conversion of 0 to a double. There was actually one line in the test suite (or the examples, I don’t remember) which exactly had this problem. Easy to solve (change 0 to 0.), but code that compiled before now doesn’t which is always a major thing. We should say that the code was illegal ever since, but accidentally compiled in the past.

A deeper and more interesting issue in the same direction is this one here:

template <class T = Real>
void setCouponPricer(
const typename Leg_t<T>::Type &leg,
const boost::shared_ptr<FloatingRateCouponPricer_t<T> > &pricer) {
...


An utility function which sets coupon pricers. You are already not scared anymore by the first argument’s typename and ::Type stuff, are you ? Now look in the Gaussian1dModel.cpp example file, lines 549ff:

...
const Leg &leg0 = underlying4->leg(0);
const Leg &leg1 = underlying4->leg(1);
boost::shared_ptr<CmsCouponPricer> cmsPricer =
boost::make_shared<LinearTsrPricer>(swaptionVol, reversionQuote);
boost::shared_ptr<IborCouponPricer> iborPricer =
boost::make_shared<BlackIborCouponPricer>();

setCouponPricer<Real>(leg0, cmsPricer);
setCouponPricer<Real>(leg1, iborPricer);
...


Everything looks usual here, altohugh Leg, CmsCouponPricer, IborCouponPricer, BlackIborCouponPricer are again all merely typedef‘s for the new template versions with T = Real as above. Maybe interesting to mention, LinearTsrPricer is not converted yet, so old code can be mixed with new one, even if the both worlds are connected by an inheritance relationship. Which is good.

But why does setCouponPricer need the explicit specification of the double type ? This is because the compiler does not recognize that a boost::shared_ptr<U> is a specialization of a boost::shared_ptr<V>, even if U (CmsCouponPricer) specializes V (FloatingRateCouponPricer). He (or she ? … C++ compilers are more likely female creatures, aren’t they ?) doesn’t know that boost::shared_ptr is some kind of a pointer, eh, it is just some user defined class. You may wonder (at least I did), why this kind of code worked until now anyway in other, non-templated-function-contexts. The reason is that implicit conversions are defined for these cases in the boost::shared_ptr class. But these are not even noticed by the compiler when infering template types. Bad luck, she is a bit picky here. Actually this problem can be worked around by using a generic boost::shared_ptr<G> second argument in setCouponPricer (which accepts everything that is a boost shared pointer) and using some very fancy generic programming stuff to check that G is actually a FloatingRateCouponPricer_t during compile time. This is something for later thoughts and at leat one blog post alone.

Time for the full code example. I bootstrap a toy curve from 3 deposit quotes with maturities 1m, 2m, 3m and compute the sensitivity of the 3m zero rate to the underlying deposits’ market quotes. Depending on the self documenting macro YES_I_WANT_TO_USE_AD either AD or a simple finite difference approximation will be used in our example.

#include <ql/quantlib.hpp>
#include <boost/assign/std/vector.hpp>

using namespace QuantLib;
using namespace boost::assign;

// comment or uncomment this macro

#endif

int main() {

// define the double type to be used

std::cout << "Example with AD enabled" << std::endl;
#else
std::cout << "Example with AD disabled, use finite differences"
<< std::endl;
typedef double dbltype;
#endif

// some typedefs to keep notation simple

// the reference date

Date referenceDate(2, January, 2015);
Settings::instance().evaluationDate() = referenceDate;

// declare the independent variables (sample deposit quotes)
std::vector<dbltype> x(3);
x[0] = 0.0035;
x[1] = 0.0090;
x[2] = 0.0121;

#endif

Real h = 1e-4;
auto rate1mp = boost::make_shared<SimpleQuoteAD>(x[0] + h);
auto rate2mp = boost::make_shared<SimpleQuoteAD>(x[1] + h);
auto rate3mp = boost::make_shared<SimpleQuoteAD>(x[2] + h);
#endif

// build a piecewise curve

quote1m, 1 * Months, 2, TARGET(), ModifiedFollowing, false,
Actual360());

quote2m, 2 * Months, 2, TARGET(), ModifiedFollowing, false,
Actual360());

quote3m, 3 * Months, 2, TARGET(), ModifiedFollowing, false,
Actual360());

instruments += dp1m, dp2m, dp3m;

PiecewiseYieldCurve<ZeroYield, Linear, IterativeBootstrap, dbltype> curve(
referenceDate, instruments, Actual365Fixed());

std::vector<dbltype> y(1);

Real t3 = curve.timeFromReference(dp3m->latestDate());
//Real t3 = 2.5 / 12.0;

y[0] = curve.zeroRate(t3, Continuous).rate();

std::cout << std::setprecision(16);
std::cout << "zero rate = " << y[0] << std::endl;

// define the operation sequence
std::vector<Real> dw(3), w(1, 1.0);
dw = f.Reverse(1, w);
std::cout << "gradient = (" << dw[0] << "," << dw[1] << "," << dw[2] << ")"
<< std::endl;
#else
// finite difference values
Real y0_1 = curve.zeroRate(t3, Continuous).rate();
Real y0_2 = curve.zeroRate(t3, Continuous).rate();
Real y0_3 = curve.zeroRate(t3, Continuous).rate();
std::cout << "gradient = (" << (y0_1 - y[0]) / h << "," << (y0_2 - y[0]) / h
<< "," << (y0_3 - y[0]) / h << ")" << std::endl;
#endif

return 0;
}


Note that the bootstrap process itself (which is a zero search using the Brent solver) is differentiated when using AD !

The output of this code with and without AD enabled is


Example with AD disabled, use finite differences
zero rate = 0.01188296345959088

zero rate = 0.01188296345959088



The interpretation is that the 3m (continuously compounded Actual/365Fixed) zero rate grows by 0.04, 0.0, 0.97 basispoints if the 1m, 2m, 3m deposit market quotes (simply compounded, Actual/360) grow by 1 basispoint respectively. Note that the sensitivity on the 1m quote is coming from the two fixing days involved in the deposits.

Lets check if the linear interpolation is correctly taken into account in the derivatives, too. This is also something that really requires AD: While you could do it for linear interpolations by hand, you wouldn’t really want to differentiate the value of say a cubic spline at some arbitrary point by the splines’ values on its pillars, even if this is still realistic to write down. For our test, we replace t3 by two and a half month (see the commented line in the code). We get


Example with AD disabled, use finite differences
zero rate = 0.01003550346419732

zero rate = 0.01003550346419732



CppAD is doing a very good job here, it seems. See you again in two weeks. Think about contributing to my adjoint branch !

In this second post I want to write about my very first steps in automatic differentiation (AD). Of course applied to quantitative finance where derivatives are called greeks. And with QuantLib of course. AD is a topic Ferdinando Ametrano mentioned during our dinner during the workshop in Düsseldorf a few weeks ago and indeed it sounded interesting: Take a swap which can have hundreds of deltas nowadays and compute all of them in just one sweep with the complexity of at most 4 (four, vier, quattro) present value calculations. Sounds like sorcery, but actually is nothing more than an application of the chain rule of differentiation. This says

$\frac{d}{dx} (f \circ g) = \left( \frac{d}{dy} f \circ \frac{d}{dx} g \right)$

which means if you want to compute a derivative of the result of a chain of computations by an input variable you can do so by computing the derivatives of the single computations and combining the results appropriately. Note that the formula is more general than it might look at first sight. It is true for functions of several variables, possibly with several outputs also. Derivatives are then matrices (Jacobi matrices) and the $\circ$ means matrix multiplication.

Actually my analysis professor at university introduced the derivative as a bounded linear operator between Banach spaces (which can have infinite dimensions, i.e. infinitely many input and output variables, which do not need to be countable even, boom, brain overflow in the second year …) approximating the function in question with order at least $o(\lVert h \rVert)$. Pure fun, only outperformed by his colleague who started the first lesson in linear algebra by defining what a semi-group is. It is only now that I am working in banks for more than 15 years that I really appreciate this kind of stuff and wished I’d have stayed in academia. Well, my problem, not yours.

Anyway. There a a lot of AD frameworks for C++ and a good starting point is surely the website http://www.autodiff.org/. What they all do is taking your (C++) program, looking at each of the operations and how they are chained together and then compute derivatives using exact formulas in each step.

This is exactly what we all learned at school, namely how to differentiate sums, products, quotients of functions, what the derivative of $x^n$ is, or how to differentiate more complicated functions like $e^x$ or $\sin(x)$ and so on. And how to put everything together using the chain rule! In AD language this is more precisely called forward mode differentiation. There is also a backward mode working from the outside to the inside of a chain of functions. This is a bit unusual and and it useful to work through some examples to get the idea, but in the end it is also nothing more than applying the chain rule. The decision what mode should be used is dependent on the dimensions $n$ and $m$ of the function

$f: \mathbb{R}^n \rightarrow \mathbb{R}^m$

If $m$ is big and $n$ is small, you should use the forward mode to compute the derivatives of the $m$ output values by the $n$ input values in the most efficient way. If $m$ is small and $n$ is big, you should use the reverse mode. In our application above, computing hundreds of interest rate deltas for a swap, $m$ is one and $n$ is a few hundreds, so this is a problem for the reverse mode.

There are two main procedures how the frameworks do automatic differentiation: One way is source code transformation (SCT). I did not look into this, but as far as I understood the idea is that your source code is enriched in order to gather the necessary information for derivatives computation. The other way is operator overloading (OO). This means that your standard numeric type, typically our beloved 53bit

double

is replaced by a special type, say (notation stolen from the framework I will introduce below)

AD<double>

and each operation (like

+, -, *, /

is overloaded for this new type, so that during computation of the original code, the operation sequence can be taped and afterwards used for derivatives computation. For the younger readers among you who do not know what “taping” means, this refers to this beautiful kind of device http://upload.wikimedia.org/wikipedia/commons/c/c7/TG1000.ogv (I could watch this movie for hours …). The red button at the lower right corner of the device is for recording I guess. Usually you would have to press two buttons “record” and “play” at the same time to start recording. Only pressing the record button would not work. You had to press the record button a little bit earlier than the play button, because in play mode the record button was blocked.

Now the hard part of this post begins, I am trying to get some AD running for a relevant example. The framework of my choice is CppAD (http://www.coin-or.org/CppAD/). An exciting yet easy enough first example is probably the Black76 formula. This is implemented in blackformula.hpp and the interface looks as follows

/*! Black 1976 formula
\warning instead of volatility it uses
standard deviation,
i.e. volatility*sqrt(timeToMaturity)
*/
Real blackFormula(Option::Type optionType, Real strike,
Real forward,
Real stdDev,
Real discount = 1.0,
Real displacement = 0.0);


The first step is to do something to allow for our AD-double-type. A possible solution is to turn the original implementation into a templated one like this

/* ! Black 1976 formula, templated */
template <class T = Real>
T blackFormula(Option::Type optionType, T strike,
T forward, T stdDev,
T discount = 1.0,
T displacement = 0.0) {
/* ... */
}


That’s not all, unfortunately. In the function body we have a line

T d1 = log(forward / strike) / stdDev + 0.5 * stdDev;


In order to have the logarithm differentiated correctly we have to make sure that if T is the AD type, the log function is taken to be the special implementation in the CppAD library (and the std implementation otherwise). To do so I made both implementations visible by importing them into the current namespace by

using std::log;


Now depending on T being the standard double or the AD type, the appropriate implementation is used. First problem solved.

There is another function, the cumulative normal, used in the black formula, with no implementation in CppAD. Well no problem, there is the error function at least, so I can just replace the cumulative normal with the error function. The first tests were disappointing. For a simple call I got slightly different premia (about 0.01 basispoints different), depending on whether using the conventional double or the AD double. The reason became clear soon (after some hours of testing in the wrong direction): In the CppAD implementation, the error function uses an approximation, which is fast, but inaccurate (relative error of $10^{-4}$).

Not that I very much care for super precise results in the context of finance, but the acceptance of the implementation would probably suffer a bit when Black prices are not matching reference results I guess …

Ok, I wrote an email the author of CppAD and complained. He promised to do something about it. In the meantime I decided to help myself by converting the QuantLib implementation of the error function into a templated version as well and using this instead of the CppAD error function. The conversion process already felt a bit more standard now, so I guess the rest of the library will just go smoothly. The code snippet in the Black formula then looks as follows:

ErrorFunction<T> erf;
T nd1 = 0.5 + 0.5 * erf(optionType * d1 * M_SQRT1_2);
T nd2 = 0.5 + 0.5 * erf(optionType * d2 * M_SQRT1_2);


Note the error function also gets the template parameter T, which will be double or AD double in the end. Now before we start to write some client code using the new black formula there is some subtlety I want to mention. Functions are often not simple chains of operations but contain conditional expressions sometimes, like

z = x > y ? a : b;


The max function is another example of an (disguised) expression of this kind. Loops also, but they are even a more complex variant actually.

The thing about these conditional expressions is the following: You can just leave them as they are under automatic differentiation, no problem. But as I already mentioned the process is two step: The first is to tape the operation sequence. This is done during an evaluation of the function with certain fixed input parameters. The second step is the derivative computation. The result is the derivative evaluated at the original input parameters.

But you can evaluate the derivative at a different point of input parameters without retaping the operation sequence! This is possible only if the function evaluation is completely captured by the CppAD tape recorder, including all possible paths taken depending on conditional expressions. This is supported, but you need to adapt your code and replace the conditionals by special functions. In the black function we have for example

if (stdDev==0.0)
return std::max((forward-strike)*optionType,
Real(0.0))*discount;


This needs to be replaced by

T result0a = max<T>((forward - strike) * optionType,
0.0) * discount;


(in particular not just terminating with returning the value for the border case of zero standard deviation) and at the end of the function

T ret = CondExpEq(stdDev, T(0.0), result0a,
CondExpEq(T(strike), T(0.0), result0b,
result));
return ret;


Here another conditional is nested for the special case of zero strike. Finally the max function above is also implemented using the CppAD conditional CondExpEq, because not present natively in CppAD:

namespace CppAD {

}

}


In order to keep the code running without CppAD types we have to add an implementation for CondExpEq for regular double types, like this

// used for cppad-ized function implementations
inline double CondExpGt(double x, double y,
double a, double b) { return x >= y ? a : b; }
inline double CondExpEq(double x, double y,
double a, double b) { return x == y ? a : b; }


The nice thing of all this is that later on we could tape the operation sequence once and store it for reuse in all following computations of the same function.

I just started trying to make QuantLib cooperate with CppAD, so the things above are initial ideas to keep QuantLib backward compatible on the one hand and avoid code duplication on the other hand. And it is hell of a lot of work I guess with many more complications than can be seen so far in this little example, to be realistic.

But let’s try out what we have done so far. Here is a code example that works with my new adjoint branch https://github.com/pcaspers/quantlib/tree/adjoint

#include <iostream>
#include <vector>
#include <ql/quantlib.hpp>

using namespace QuantLib;

int main(void) {

// parameters

Real tte = 10.0;
Real sqrtt = std::sqrt(tte);
Real strike = 0.03, forward = 0.03, volatility = 0.20;

// declare the inpedendent variables for which
// we want derivatives

x[0] = forward;
x[1] = volatility * sqrtt;

// tape the operation sequence

strike, x[0], x[1]);

// compute the partial derivatives using reverse mode

std::vector<Real> dw(2), w(1, 1.0);
dw = f.Reverse(1, w);

// output the results

std::cout << "price = " << y[0] << " delta = " << dw[0]
<< " vega = "
<< dw[1] / sqrtt * std::sqrt(100.0)
<< std::endl;

// cross check the results against the classic
// implementation

BlackCalculator c(Option::Call, strike, forward,
volatility * sqrtt);
std::cout << "price = " << c.value() << " delta = "
<< c.deltaForward()
<< " vega = " << c.vega(tte) << std::endl;

return 0;
}


I kept the original blackFormula function as is for the moment (for testing purposes) and implemented the new, templated version as blackFormula2. I declared the forward and the standard deviation input as independent variables w.r.t. which I want partial derivatives (i.e. forward delta and vega). The strike input parameter (as well as the discounting and displacement which are invisible here) is kept as a regular double variable. Note that in the black formula implementation it is converted to the CppAD double type, but not included as an independent variable into the operation sequence. We could include it though, getting then the derivative of the call price by the strike (which is useful also, because this is the negative digital price).

The results are compared against the original implementation in BlackCalculator. The output of the test code is

price = 0.007445110978624521 delta = 0.6240851829770754 vega = 0.03600116845290408
price = 0.007445110978624521 delta = 0.6240851829770755 vega = 0.03600116845290408


Good, works. And we did not need to implement the analytical formulas for forward delta and vega to get exactly the same result up to machine epsilon. Sorcery. Needless to say that CppAD can compute higher order derivatives as well, so gamma, vanna, volga, speed, charm and color are just a stone’s throw away.

Have a nice christmas everyone, see you back healthy in 2015 !