Adjoint Greeks III – Vanilla Swaps

Time for a post because next week I will be in Dublin over the weekend. Three days with my beloved wife, just the two of us. Without our four little ones who will have a great time too, with their granny. But also without my laptop, which makes me a bit nervous to be honest. Anyway. Good news from my adjoint greeks project, I reached the first bigger milestone, which is compute the delta vector for a plain vanilla swap.

Or rather for a whole portfolio of vanilla swaps. The setting should be realistic and allow for first performance checks, so I look at portfolios of 1, 10, 100, 1000, 5000 and 10000 swaps with random maturities in a horizon of 10, 20, 30, 50 and 70 years, giving 30 test cases for all combinations in total.

The underlying curve (only one, but it would work with separate discounting and forwarding curves as well) consists of 10 deposits, 5 forward rate agreements and as many swaps as the horizon suggests with yearly spacing, i.e. we have 25, 35, 45, 65 and 85 points on the curve for each of the above horizon scenarios respecticely.

The goal is to

  • compute the NPV of the swap portfolio
  • compute bucket deltas with respect to all instrument in the underlying curve

both in the traditional way (shifting the market quotes of the curve instruments and revalue the portfolio on the new curve) and with AD.

You can see the whole example code here https://github.com/pcaspers/quantlib/blob/adjoint/QuantLib/Examples/AdjointSwapDeltas/adjoint.cpp. Note that no special engine is used to generate the adjoint deltas. We just do a vanilla swap pricing (almost) as always, collecting the market variables of interest in a vector xAD and the result (the NPV) in yAD and then say

CppAD::ADFun<Real> f(xAD, yAD);
std::vector<Real> deltasAD(xAD.size()), w(1, 1.0);
deltasAD = f.Reverse(1, w);

Let’s start looking at one single swap. The delta pillars are numbered from 1 to 45 simply, meaning overnight, tomnext, spotnext, spotweek, 1m, … , 6m, 1m-7m fra, 2m-8m, … , 5m-11m, 1y swap, 2y, … , 30y. The random swap below apparently has a maturity between 23y and 24y from today. The notional of the swap is 100 million (in general the portfolio’s total notional is 100 million in all cases regardless of the number of swaps in it).

The finite difference deltas in the column labeled double are computed with a step size of 1E-6 here. The AD values are in the next column and the rightmost column displays the difference between these two. The CppAD framework once more does not disappoint us and gives reliable results. Note that the deltas are expressed w.r.t. a one basis point (1E-4) shift.

results:                double     AD<double>     difference
   NPV            -64037022.66   -64037022.66           0.00
   Delta #    1          53.94          53.93           0.00
   Delta #    2          17.98          17.98           0.00
   Delta #    3           0.00          -0.00           0.00
   Delta #    4           0.00           0.00          -0.00
   Delta #    5           0.00           0.00          -0.00
   Delta #    6           2.28           2.28          -0.00
   Delta #    7       -2439.17       -2439.17           0.00
   Delta #    8           0.00          -0.00           0.00
   Delta #    9           0.00           0.00          -0.00
   Delta #   10           0.00          -0.00           0.00
   Delta #   11           0.00           0.00          -0.00
   Delta #   12           7.13           7.13          -0.00
   Delta #   13         182.54         182.54          -0.00
   Delta #   14           0.00          -0.00           0.00
   Delta #   15           0.00           0.00          -0.00
   Delta #   16          36.26          36.26           0.00
   Delta #   17         495.51         495.51          -0.00
   Delta #   18         731.54         731.54          -0.00
   Delta #   19         994.67         994.68          -0.00
   Delta #   20        1238.63        1238.63          -0.00
   Delta #   21        1488.41        1488.42          -0.00
   Delta #   22        1745.32        1745.33          -0.01
   Delta #   23        2013.84        2013.85          -0.01
   Delta #   24        2234.96        2234.97          -0.01
   Delta #   25        2535.77        2535.79          -0.01
   Delta #   26        2769.90        2769.92          -0.02
   Delta #   27        3040.39        3040.41          -0.02
   Delta #   28        3321.43        3321.45          -0.02
   Delta #   29        3543.09        3543.12          -0.03
   Delta #   30        3861.57        3861.65          -0.08
   Delta #   31        4109.57        4109.51           0.06
   Delta #   32        4364.59        4364.68          -0.09
   Delta #   33        4659.00        4659.04          -0.05
   Delta #   34        4958.68        4958.73          -0.05
   Delta #   35        5166.84        5166.90          -0.06
   Delta #   36        5529.58        5529.64          -0.06
   Delta #   37        5745.63        5745.69          -0.07
   Delta #   38       60419.04       60419.23          -0.19
   Delta #   39      167048.46      167051.01          -2.55
   Delta #   40           0.00           0.00           0.00
   Delta #   41           0.00           0.00           0.00
   Delta #   42           0.00           0.00           0.00
   Delta #   43           0.00           0.00           0.00
   Delta #   44           0.00           0.00           0.00
   Delta #   45           0.00           0.00           0.00

Now let’s look at timings. In a bigger example, with 70y horizon (85 curve pillars) and 10000 swaps in the portfolio. On my laptop,

maximum maturity           70 years
portfolio size          10000 swaps
delta vector size          85 pillars

timings (ms)            double     AD<double>         factor  eff#NPVs
   pricing                1070           2690
   deltas                99030           2700
   total                100100           5390        18.5714   4.57692

This says that the pricing takes 1070ms with standard double computations and 2690ms with AD. Approximately, because you can not really trust single timings, in particular not for small problems.

The delta computation under the naive bump and revalue approach takes another fatty 99 seconds, while CppAD only consumes 2.7 seconds on top of the 2.7 seconds from the pricing. Obviously more is done in the pricing step compared to a double computation, later on used in the reverse sweep, so ultimately we have to look at the sum of pricing and delta computation. For this we get a speed up of 18.5x with AD.

The last number eff#NPVs is the effective number of NPV calculations during the AD run which is the total time of the AD calculation divided by the average time for one pricing with classic double‘s. From theory we know that this should be bounded by 5, here it is 4.6.

The magic of AD is that with bigger problems, the 4.6 stays constant, not the speed up factor. If we look at 115 pillars instead of the 85 for example,

maximum maturity          100 years
portfolio size          10000 swaps
delta vector size         115 pillars

timings (ms)            double     AD<double>         factor  eff#NPVs
   pricing                1610           4060
   deltas               204630           4100
   total                206240           8160        25.2745   4.55004

Note that swaps with longer term are generated for this example, so a single pricing takes longer than before. Otherwise we stay at 4.6 times one NPV computation for now 115 bucket deltas and get a speed up of over 25 !

It does not matter how many greeks you want, you just get them for free. I already understood that I thought, but seeing it in reality is quite cool. Let your problem grow as it wants, your complexity stays constant. In other circumstances you are glad to be linear. Sometimes you are lucky-logarithmic. Here complexity is constant. Crazy.

Running all the examples mentioned above and plotting the effective number of NPV calculations against the problem size (defined as the product of the delta vector and the portfolio size) we get this picture.

adjointeffnpv

The effective number of NPV calculations seems to stabilize slightly above 4.5. Cough, probably we should produce more points first … but with a sharp eye …

Next steps: Volatility term structures (which I already scratched involuntarily when converting the ibor coupon pricers), Black pricers, Hull White and Markov functional model. The latter both being slow and totally and utterly inaccessible for bump and revalue sensitivities. And of course filling in all the holes I left open so far. Still, help on the conversion process is highly appreciated (including not only pull requests you can send, but also expensive gifts, plain money (no bitcoins please), or a too-good-to-reject QuantLib job offer).

Advertisements
Adjoint Greeks III – Vanilla Swaps

Tag Dispatching, ZABR and Where Do You Start

Recently I added an experimental implementation of Paul Doust’s No arbitrage SABR model to QuantLib. As the name suggests this is an arbitrage free approximation to the famous stochastic \alpha\beta\rho model proposed in 2002 by Patrick Hagan

dF = F^\beta \sigma dW \\  d\sigma = \nu \sigma dV \\  dW dV = \rho dt

for a forward rate F, stochastic volatility \sigma with volatility of volatility \nu starting at an inital value \alpha, CEV parameter \beta and correlation \rho, and – in Doust’s case – assuming an absorbing barrier at zero forward level.

I am not going to talk about low interest rates and possible remedies for your models here, it is weekend after all, so let’s relax and don’t worry about the outside world too much – I think it was just yesterday that the Euro Sabr Volatility Cube in QuantLib ceased to work, telling me that the 1m into 1y forward swap rate against Euribor 3m is negative, so no further activity seems meaningful any more. Thursday was also quite interesting thanks to our swiss neighbours. I mean how are you supposed to stay focussed when everybody around you freaks out ? Really annoying.

Once the no arbitrage model core itself was implemented I needed to integrate it into the existing infrastructure. For the classic SABR implementation there are a lot of useful classes, like an interpolation class and associated factory, a smile section, a swaption cube (the one from above). The quick and dirty solution is too obvious: Copy’n’paste the existing SABR code, rename the classes and replace the SABR formula by the new Doust version. I think, ultimately, there are only three very simple criteria for good code

  • No redundant code – don’t do the same thing at two places in your program
  • Easy to maintain and extend – bug fixing and adding new functionality is easy
  • Understandable – the average colleague of yours is able to read and understand the code

The better approach for integrating the new SABR version seems therefore to take the existing code, extract everything specific to the classic implementation and use what is left to build both SABR versions on top of it. I called the common basis XABR, so for example there is now a class

template <class I1, class I2, typename Model>
class XABRInterpolationImpl : 
           public Interpolation::templateImpl<I1, I2>,
           public XABRCoeffHolder<Model> {
/* ... */

with a generic Model class giving the specification of Hagan’s 2002 SABR approximation, Doust’s 2012 no arbitrage SABR approximation, and possibly other models. Turns out that for example the SVI model also fits into the framework without any difficulties. So the label XABR is already not general enough to cover all use cases. Never mind, the main thing is that the code is generic. The Model class looks like this

    struct SABRSpecs {
        Size dimension() { return 4; }
        void defaultValues(std::vector<Real> &params, std::vector<bool> &,
                           const Real &forward, const Real expiryTIme) {
        /* ... */
        }
        void guess(Array &values, const std::vector<bool> &paramIsFixed,
                   const Real &forward, const Real expiryTime,
                   const std::vector<Real> &r) {
        /* ... */
        }
        Real eps1() { return .0000001; }
        Real eps2() { return .9999; }
        Real dilationFactor() { return 0.001; }
        Array inverse(const Array &y, const std::vector<bool>&,
                      const std::vector<Real>&, const Real) {
        /* ... */
        }
        Array direct(const Array &x, const std::vector<bool>&,
                     const std::vector<Real>&, const Real) {
        /* ... */
        }
        typedef SABRWrapper type;
        boost::shared_ptr<type> instance(const Time t, const Real &forward,
                                         const std::vector<Real> &params) {
            return boost::make_shared<type>(t, forward, params);
        }
    };

telling you the number of parameters of the model, default values to be used as starting values for calibration (if you do not specify any), a guessing algorithm giving meaningful alternative starting values if the first calibration fails or runs into a local minimum, some internal constants (not part of the Model concept directly) and inverse and direct transformation methods that map the full real hypercube to the admissible parameter space to facilitate unconstrained optimization of the model. Last not least, the instance method creates an object of the model that can be used to ask for implied volatilities.

This kind of generalization is also done for volatility cubes. Now there is a

    template<class Model>
    class SwaptionVolCube1x : public SwaptionVolatilityCube {
/* ... */

again taking a generic spec for the desired underlying model. The original Hagan cube and the new Doust cube are then retrieved as

    struct SwaptionVolCubeSabrModel {
        typedef SABRInterpolation Interpolation;
        typedef SabrSmileSection SmileSection;
    };

    typedef SwaptionVolCube1x<SwaptionVolCubeSabrModel> 
                                          SwaptionVolCube1;

    struct SwaptionVolCubeNoArbSabrModel {
        typedef NoArbSabrInterpolation Interpolation;
        typedef NoArbSabrSmileSection SmileSection;
    };

    typedef SwaptionVolCube1x<SwaptionVolCubeNoArbSabrModel> 
                                              SwaptionVolCube1a;

Pretty pretty pretty good. But what does that have to do with “Tag Dispatching” ? And is ZABR just a typo in the headline of this post ? No. I don’t know what the Z stands for (probably for nothing), but it labels a model invented by ingenious Jesper Andreasen aka Qwant Daddy at Danske Bank. I once had the luck to attend a presentation he gave. Very impressive. Anyway, the model is this

dF = F^\beta \sigma dW \\  d\sigma = \nu \sigma^\gamma dV \\  dW dV = \rho dt

so introducing a CEV dynamics for the volatility. With that you can control the wings of your smile in a very flexible way, which is nice if you are into CMS pricing for example. Although I have to mention that too many degrees of freedom are not always helpful. I make do with the classic SABR model for the Euro CMS market currently and ZABR is currently not more (and not less) than an interesting bird of paradise.

I implemented the ZABR paper (it is called expansions for the masses for some reason similarly clear as the Z in ZABR) back in 2012 in our summer holidays on Sardinia. Brutally hot three weeks, forest fires nearby our house still burning just the day before we arrived. I called our landlord, he telling me everything is safe, guess then it is ok to go there with your wife and four little kids, ain’t it. Well after all it was nice on Sardinia and the worst thing that happened was that this little rocker of the L key on my beloved laptop broke. Heroically I finished the implementation nevertheless.

The special thing about the implementation is that the paper suggests more than one solution of the model. For a start there are short maturity expansions in lognormal or normal volatility, comparable in computational complexity to the classic Hagan expansions. Then there is a Dupire style local volatility approach producing an arbitrage free smile yet at the same time being fast to evaluate. And, not in the paper, I added a brute force precise full finite difference solution, which is slow but useful to benchmark the approximations.

Originally I used an enum parameter in the constructor of the model to specify the way of model evaluation. But that doesn’t fit in the above XABR framework. Extend the XABR concept to take an additional parameter that is not needed otherwise ? Not very nice. A light variant of Tag Dispatching can help ! The idea is to have a template parameter that denotes the evaluation and then write the spec for the model like this

template <typename Evaluation> struct ZabrSpecs {
    Size dimension() { return 5; }
    Real eps() { return 0.000001; }
    void defaultValues(std::vector<Real> &params,
                       std::vector<bool> &paramIsFixed, const Real &forward,
                       const Real expiryTime) {
    /* ... */
    }
    /* ... */
    typedef ZabrSmileSection<Evaluation> type;
    boost::shared_ptr<type> instance(const Time t, const Real &forward,
                                     const std::vector<Real> &params) {
        return boost::make_shared<type>(t, forward, params);
    }
/* ... */

The good thing about this is that the spec itself is templated by the evaluation, so that the evaluation does not need to become a parameter in the spec methods (e.g. the instance method), they can just stay the same as before. In this way this approach is not intrusive at all, fully respecting the existing code structure.

To mimic the functionality of an enum I chose the Evaluation parameter to be a tag meaning that it is just a label with no other deeper meaning in its simple and superficial yet useful (so I assume ultimately happy) existence. That is I declare empty classes

struct ZabrShortMaturityLognormal {};
struct ZabrShortMaturityNormal {};
struct ZabrLocalVolatility {};
struct ZabrFullFd {};

which can be plugged into the interpolation class quite naturally like this

ZabrInterpolation<ZabrLocalVolatility> myInterpolation(/*...*/);
/*...*/

But how is the implementation itself done ? For example let’s look at the volatility implementation of the ZABR smile section

    Volatility volatilityImpl(Rate strike) const {
        return volatilityImpl(strike, Evaluation());
    }

It invokes a method with the same name, but a second parameter which is just an instance of our tag class. This parameter is not carrying any information, it is an instance of an empty class declaration. It is just used to make the compiler choose one of the overloaded versions of volatilityImpl appropriate for the kind of evaluation specified. Thuss we have several private overloaded methods

    Volatility volatilityImpl(Rate strike, ZabrShortMaturityLognormal) const;
    Volatility volatilityImpl(Rate strike, ZabrShortMaturityNormal) const;
    Volatility volatilityImpl(Rate strike, ZabrLocalVolatility) const;
    Volatility volatilityImpl(Rate strike, ZabrFullFd) const;

one for each evaluation tag. This produces more efficient code than switch statements dispatching the evaluation mode at runtime. Also the code is more readable because each evaluation is implemented in its own overloaded method – still retaining the possibility of shared code between the different kinds of evaluation of course. No redundant code. And new evaluation modes can be added in a most transparent way.

If you are interested in the full code for the ZABR implementation you can have a look at https://github.com/lballabio/quantlib/pull/168/files. If you are interested in the model itself, there are some slides about that in my presentation at http://quantlib.org/qlws14.shtml. If you find it hard to read and understand Jesper’s and Brian’s original paper, I can send you a rewritten version I created for the masses (and in particular for myself) with all the intermediate steps not crystal clear except you are a risk quant of the year.

As a final and totally unrelated remark. Even if you are not into Jazz music, here is a gem everyone should give a chance (and soon get addicted to): Wonderful Brad Mehldau’s album Where Do You Start http://en.wikipedia.org/wiki/Where_Do_You_Start. The fragile-pessimistic-yet-easy-funky Got Me Wrong, the probably coolest (sorry Jimi) version of Hey Joe ever heard (though I did not listen to all of the apparently more than 1700 versions out there https://heyjoeversions.wordpress.com/hey-joe-versions/), the simply beautiful Samba e Amor evolving into Brad’s own Jam, the so much sensible and consoling Time Has Told Me. Pure art. Give it a try.

Tag Dispatching, ZABR and Where Do You Start

Adjoint Greeks II

Welcome back. I hope you all had some nice Christmas days. I surely had. Still some time to quantlib a bit. In my last post (https://quantlib.wordpress.com/2014/12/20/adjoint-greeks/) I wrote about automatic differentiation (AD) and applied it to delta and vega computation in the Black76 model. Since the results were promising I started to work more seriously on my adjoint branch (https://github.com/pcaspers/quantlib/tree/adjoint). It is still in a proof of concept state. The first milestone would be an engine that computes adjoint interest rate deltas for QuantLib::Swap instruments. And guess what, I am almost there. It is hell of a lot of work though and if someone is interested to help, please contact me. You will get intimate with large parts of the QuantLib code and get your hands dirty at the same time with a specific goal to reach.

The essential work is to replace all quantities w.r.t. which we potentially want sensitivities by a generic type T. Then we can plug in a double if we want to work in classic mode or the active type of an AD framework like the one we already know from the CppAD library

CppAD::AD<double>

in order to leverage automatic differentiation in QuantLib. By the way CppAD’s author Brad Bell has replaced the error function with the one from the std:: namespace (it has become part of the C++11 standard). He helped me also with some cmake problems, very nice. At the moment I only work with CppAD, but the changes in the library are not specific to this framework, so other tools (at least those relying on operator overloading) could be used as well. An important point to keep in mind, don’t make yourself dependent on a single third party library. It is enough that we are all dependent on QuantLib, isn’t it. Anyway, all the CppAD specific things I added are currently collected in one, small, single header file ql/qlcppad.hpp.

To get a flavour of what it means to template’ize QuantLib, look at the example of a simple quote. The public part now looks as follows

template <class T = Real> class SimpleQuote_t :
 public Quote_t<T> {
  public:
    SimpleQuote_t(T value = Null<Real>());
    T value() const;
    bool isValid() const;
    T setValue(T value = Null<Real>());
    void reset();
...

The templated version of the class is market by an underscore t. Actually the old class does not exist any more, but is rather retrieved by a typedef

typedef SimpleQuote_t<Real> SimpleQuote;
...

to keep existing code compiling without any changes (well, not quite in all cases, unfortunately, but see below for this). On another hand you can now instantiate a simple quote using an AD type with another typedef

typedef SimpleQuote_t<CppAD::AD<double>> SimpleQuoteAD;
...

We can then put this T – simple quote into a T – rate helper and create a T – piecewise yield curve, which will bootstrap the curve using a T – brent solver so that you can finally retrieve zero rates as AD doubles (using a T – linear interpolation, not to forget) and at the same time the sensitivity of zero rates to the input market quotes.

This is what I am trying to demonstrate today. It is not exactly the milestone example from above, but technically very close. The many T’s from above already suggest that along the way you have to adapt a lot of code. And this is not always funny. If you have an error in your templated code, the compiler will often just vomit a big flood of long, unreadable error messages over your compiler window. If you are lucky, at some secret point in that ugly thick soup there is a tender hint of what might actually be the true source of the problem. Then once the code compiles with double‘s this does not at all mean that it does so with AD<double>‘s.

What is more is that we have a lot of special things in QuantLib, like the Null type which for double‘s is implicitly converted to a certain value (namely std::numeric_limits<float>::max) to mark invalid values. This has to be made consistent with an AD double null type. Taking this example we have to introduce a Null<CppAD::AD<double>> which starts not too difficult with

template <class Base> class Null<CppAD::AD<Base> > {
  public:
    Null() {}
    operator CppAD::AD<Base>() const {
        return CppAD::AD<Base>(static_cast<Base>(Null<Base>()));
    }
...

but since inside CppAD conversion from any type T to CppAD::AD<Base> (replace Base by double if you want) is done like this

template <class Base>
template <class T>
inline AD<Base>& AD<Base>::operator=(const T &t)
{	return *this = Base(t); }

i.e. via the Base type, we need also a conversion from our new Null type above to Base, which can be done as follows

..
    // this is needed, because in ad_assign.hpp line 124ff
    // assignment from T to AD<Base> is done via conversion from T
    // to Base and then to AD<Base>. If for example
    // T = Null<CppAD::AD<double>> we need to be able to convert
    // to double so that then conversion to AD<double> from this
    // works.
    operator Base() const { return static_cast<Base>(Null<Base>()); }
...

and which does the trick in the end.

Some structural problems arise, because when templating you effectively turn parts of the library into a header-only version, that is, the implementation moves from the cpp to the hpp files. This is because the templated code can only be instantiated when compiling actual client code. Or you instantiate a list of specific types in the library itself, but then you tie yourself to exactly these implementations (e.g. support for CppAD in our context).

At least for iborcoupon.hpp, cmscoupon.hpp and couponpricer.hpp I arrived at circle references, which were not solvable by forward declarations. So I had to split these files into two parts each, one base part, that is included internally at some places in the library instead of the whole thing and a second part with the same name as the whole thing, which in turn includes the base part, to stay backward compatible.

Another source of many headaches was the already highly generic territory around the piecewise yield curve. I really admire this part of the library, it is beautiful and works incredibly robust. But it is challenging to adapt this piece of art to enable AD. In the end I still wanted to write something like

 
PiecewiseYieldCurve<ZeroYield, 
                    Linear,
                    IterativeBootstrap, 
                    CppAD::AD<double>> curve(referenceDate, 
                                             instruments, 
                                             Actual365Fixed());
...

to initialize a curve. The only change is the last template parameter, which can be set optionally to an AD double type (if not given, it is just assumed double, so replicating the old behaviour).

One real non backward consistent change I introduced here is that factory classes for interpolations are now templated. The expression Linear therefore has a different meaning than before, namely taking a template type T that should be used for the calculations. The pro is that in generic classes like the piecewise yield curve we can just still write Linear and behind the scenes the type T (taken from the last template parameter of PiecewiseYieldCurve) is plugged into the factory template so that it can be done what has to be done. The con is that if the factory is explicitly used to create an interpolation object, we have to write

Linear<Real>

now instead of Linear alone. But assuming that the factory classes are mostly used as template parameters and not to actually instantiate classes in client code, this is hopefully not a huge change. Well, let’s see what Luigi has to say about this … 😉

Here is the new piecewise yield curve declaration, compared to the original version now an even nicer example of what you can do with templates in C++ …

template <template <class> class Traits, template <class> class Interpolator,
          template <class, class> class Bootstrap = IterativeBootstrap,
          class T = Real>
class PiecewiseYieldCurve
    : public Traits<T>::template curve<Interpolator>::type,
      public LazyObject {
...

Not only the Interpolator, but also the Traits (which are the Discount, Forward, ZeroYield kind of specifications) and the Boostrap class (IterativeBootstrap or LocalBootstrap) now have a new template parameter.

But to summarize, from outside everything looks and is used as before. If you want to use an AD double type you just have to add a new template parameter to the PiecewiseYieldCurve construction, that’s all.

Another limitation I came across is that there are not templated typedefs in C++. So for example the RateHelper typedef changes to

template <class T> struct RateHelper_t {
    typedef BootstrapHelper<YieldTermStructure_t<T>, T> Type;
};

and the final rate helper types (classic and AD’ized versions respectively) are retrieved via

typedef RateHelper_t<Real>::Type RateHelper;
typedef RateHelper_t<CppAD::AD<Real>>::Type RateHelperAD;
...

i.e. with an additional ::Type suffix.

Regarding templated functions, the type can generally be deduced from the function parameters, so that we can still write code like

Real z = 1.0;
if (close(z, 0.0)) {}

(an exceptionally dumb code example), although the declaration of close has changed to

template<class T> bool close(T x, T y);

However this is not totally backward compatible, because the expression close(z,0) was legal code before and now the compiler goes “hey, I need to deduce a type T here and you are giving me an int and a double, I am confused, I can not work like this, please focus a bit on what you are doing, will you”. This was silently resolved before by an implicit conversion of 0 to a double. There was actually one line in the test suite (or the examples, I don’t remember) which exactly had this problem. Easy to solve (change 0 to 0.), but code that compiled before now doesn’t which is always a major thing. We should say that the code was illegal ever since, but accidentally compiled in the past.

A deeper and more interesting issue in the same direction is this one here:

template <class T = Real>
void setCouponPricer(
    const typename Leg_t<T>::Type &leg,
    const boost::shared_ptr<FloatingRateCouponPricer_t<T> > &pricer) {
...

An utility function which sets coupon pricers. You are already not scared anymore by the first argument’s typename and ::Type stuff, are you ? Now look in the Gaussian1dModel.cpp example file, lines 549ff:

...
        const Leg &leg0 = underlying4->leg(0);
        const Leg &leg1 = underlying4->leg(1);
        boost::shared_ptr<CmsCouponPricer> cmsPricer =
            boost::make_shared<LinearTsrPricer>(swaptionVol, reversionQuote);
        boost::shared_ptr<IborCouponPricer> iborPricer =
            boost::make_shared<BlackIborCouponPricer>();

        setCouponPricer<Real>(leg0, cmsPricer);
        setCouponPricer<Real>(leg1, iborPricer);
...

Everything looks usual here, altohugh Leg, CmsCouponPricer, IborCouponPricer, BlackIborCouponPricer are again all merely typedef‘s for the new template versions with T = Real as above. Maybe interesting to mention, LinearTsrPricer is not converted yet, so old code can be mixed with new one, even if the both worlds are connected by an inheritance relationship. Which is good.

But why does setCouponPricer need the explicit specification of the double type ? This is because the compiler does not recognize that a boost::shared_ptr<U> is a specialization of a boost::shared_ptr<V>, even if U (CmsCouponPricer) specializes V (FloatingRateCouponPricer). He (or she ? … C++ compilers are more likely female creatures, aren’t they ?) doesn’t know that boost::shared_ptr is some kind of a pointer, eh, it is just some user defined class. You may wonder (at least I did), why this kind of code worked until now anyway in other, non-templated-function-contexts. The reason is that implicit conversions are defined for these cases in the boost::shared_ptr class. But these are not even noticed by the compiler when infering template types. Bad luck, she is a bit picky here. Actually this problem can be worked around by using a generic boost::shared_ptr<G> second argument in setCouponPricer (which accepts everything that is a boost shared pointer) and using some very fancy generic programming stuff to check that G is actually a FloatingRateCouponPricer_t during compile time. This is something for later thoughts and at leat one blog post alone.

Time for the full code example. I bootstrap a toy curve from 3 deposit quotes with maturities 1m, 2m, 3m and compute the sensitivity of the 3m zero rate to the underlying deposits’ market quotes. Depending on the self documenting macro YES_I_WANT_TO_USE_AD either AD or a simple finite difference approximation will be used in our example.

#include <ql/quantlib.hpp>
#include <boost/assign/std/vector.hpp>

using namespace QuantLib;
using namespace boost::assign;

// comment or uncomment this macro
//#define YES_I_WANT_TO_USE_AD

// cppad utilities
#ifdef YES_I_WANT_TO_USE_AD
#include <ql/qlcppad.hpp>
#endif

int main() {

// define the double type to be used

#ifdef YES_I_WANT_TO_USE_AD
    std::cout << "Example with AD enabled" << std::endl;
    typedef CppAD::AD<double> dbltype;
#else
    std::cout << "Example with AD disabled, use finite differences"
              << std::endl;
    typedef double dbltype;
#endif

    // some typedefs to keep notation simple

    typedef RateHelper_t<dbltype>::Type RateHelperAD;
    typedef DepositRateHelper_t<dbltype> DepositRateHelperAD;
    typedef SimpleQuote_t<dbltype> SimpleQuoteAD;
    typedef Quote_t<dbltype> QuoteAD;

    // the reference date

    Date referenceDate(2, January, 2015);
    Settings::instance().evaluationDate() = referenceDate;

    // declare the independent variables (sample deposit quotes)
    std::vector<dbltype> x(3);
    x[0] = 0.0035;
    x[1] = 0.0090;
    x[2] = 0.0121;

#ifdef YES_I_WANT_TO_USE_AD
    CppAD::Independent(x);
#endif

    auto rate1m = boost::make_shared<SimpleQuoteAD>(x[0]);
    auto rate2m = boost::make_shared<SimpleQuoteAD>(x[1]);
    auto rate3m = boost::make_shared<SimpleQuoteAD>(x[2]);

#ifndef YES_I_WANT_TO_USE_AD
    Real h = 1e-4;
    auto rate1mp = boost::make_shared<SimpleQuoteAD>(x[0] + h);
    auto rate2mp = boost::make_shared<SimpleQuoteAD>(x[1] + h);
    auto rate3mp = boost::make_shared<SimpleQuoteAD>(x[2] + h);
#endif

    // build a piecewise curve

    RelinkableHandle<QuoteAD> quote1m(rate1m), quote2m(rate2m), quote3m(rate3m);

    auto dp1m = boost::make_shared<DepositRateHelperAD>(
        quote1m, 1 * Months, 2, TARGET(), ModifiedFollowing, false,
        Actual360());

    auto dp2m = boost::make_shared<DepositRateHelperAD>(
        quote2m, 2 * Months, 2, TARGET(), ModifiedFollowing, false,
        Actual360());

    auto dp3m = boost::make_shared<DepositRateHelperAD>(
        quote3m, 3 * Months, 2, TARGET(), ModifiedFollowing, false,
        Actual360());

    std::vector<boost::shared_ptr<RateHelperAD> > instruments;
    instruments += dp1m, dp2m, dp3m;

    PiecewiseYieldCurve<ZeroYield, Linear, IterativeBootstrap, dbltype> curve(
        referenceDate, instruments, Actual365Fixed());

    std::vector<dbltype> y(1);

    Real t3 = curve.timeFromReference(dp3m->latestDate());
    //Real t3 = 2.5 / 12.0;

    y[0] = curve.zeroRate(t3, Continuous).rate();

    std::cout << std::setprecision(16);
    std::cout << "zero rate = " << y[0] << std::endl;

#ifdef YES_I_WANT_TO_USE_AD
    // define the operation sequence
    CppAD::ADFun<Real> f(x, y);
    std::vector<Real> dw(3), w(1, 1.0);
    // gradient computation
    dw = f.Reverse(1, w);
    std::cout << "gradient = (" << dw[0] << "," << dw[1] << "," << dw[2] << ")"
              << std::endl;
#else
    // finite difference values
    quote1m.linkTo(rate1mp);
    Real y0_1 = curve.zeroRate(t3, Continuous).rate();
    quote1m.linkTo(rate1m);
    quote2m.linkTo(rate2mp);
    Real y0_2 = curve.zeroRate(t3, Continuous).rate();
    quote2m.linkTo(rate2m);
    quote3m.linkTo(rate3mp);
    Real y0_3 = curve.zeroRate(t3, Continuous).rate();
    quote3m.linkTo(rate3m);
    std::cout << "gradient = (" << (y0_1 - y[0]) / h << "," << (y0_2 - y[0]) / h
              << "," << (y0_3 - y[0]) / h << ")" << std::endl;
#endif

    return 0;
}

Note that the bootstrap process itself (which is a zero search using the Brent solver) is differentiated when using AD !

The output of this code with and without AD enabled is


Example with AD disabled, use finite differences
zero rate = 0.01188296345959088
gradient = (0.04267701227033543,-7.65533469948565e-11,0.9682251688560017)

Example with AD enabled
zero rate = 0.01188296345959088
gradient = (0.04267719604141054,1.036619161164663e-11,0.9682373688299127)

The interpretation is that the 3m (continuously compounded Actual/365Fixed) zero rate grows by 0.04, 0.0, 0.97 basispoints if the 1m, 2m, 3m deposit market quotes (simply compounded, Actual/360) grow by 1 basispoint respectively. Note that the sensitivity on the 1m quote is coming from the two fixing days involved in the deposits.

Lets check if the linear interpolation is correctly taken into account in the derivatives, too. This is also something that really requires AD: While you could do it for linear interpolations by hand, you wouldn’t really want to differentiate the value of say a cubic spline at some arbitrary point by the splines’ values on its pillars, even if this is still realistic to write down. For our test, we replace t3 by two and a half month (see the commented line in the code). We get


Example with AD disabled, use finite differences
zero rate = 0.01003550346419732
gradient = (0.05551963176645552,0.5617050580845363,0.3946021819303639)

Example with AD enabled
zero rate = 0.01003550346419732
gradient = (0.05551987078094003,0.5617096566300429,0.3946071568278159)

CppAD is doing a very good job here, it seems. See you again in two weeks. Think about contributing to my adjoint branch !

Adjoint Greeks II