Adjoint Greeks VI – Source Code Transformation Pt. II

I will be talking about Automatic differentiation beyond typedef and operator overloading at this year’s QuantLib User Conference. Registration is free, but seats are limited. The blog post today is sort of a teaser, so do not miss to be in Düsseldorf on November 30th / December 1st.

To recall what has been done to make QuantLib AD-ready: We have two approaches. One is by CompatibL which in essence changes the typedef for Real in the library from double to some active type of an operator overloading tool, like CppAD::AD. Of course this doesn’t work just by relinking the typedef, but in effect it describes well what is going on.

The other approach is to transform the library’s code and introduce template parameters whenever we might potentially be interested in calculating derivatives. You can read more about this by selecting the tag Automatic Differentiation in the sidebar that can be expanded by clicking on the three horizontal lines in the upper right corner when you scroll to the top of this page.

Backward compatibility in the template approach is achieved by typedef’ing the original class names to template instantiations with T = Real, so existing code compiles without any changes. Well let’s say 99% of it, some few modifications have to be applied, but really not many (from my experience with the original examples and the test-suite).

The transformation is not applied to the full library (yet), but the vanilla IRD stuff works, so coupons, yield term structures, volatility termstructures, the GSR model. You can get it (together with some working examples) from my adjoint branch at https://github.com/pcaspers/quantlib/tree/adjoint.

Both approaches have clear advantages and disadvantages. One potential problem with the typedef approach is, that this might kill your performance in parts of the code where you do not even want to use AD. For example,

    Matrix_t<T> A(1024, 1024);
    Matrix_t<T> B(1024, 1024);
    ...
    Matrix_t<T> C = A * B;

will take 764 ms on my machine with T=double, but 8960 ms with T=CppAD::AD. This seems to be due to compiler optimizations (like using SIMD instructions) that are no longer applicable when switching from native double to the custom AD type. This is a penalty of 11.7x and you do not get anything for that. Note that I do not record any tape or something, I just use the double functionality wrapped in the AD type.

Of course there might be other operator overloading tools that are more efficient. To get a lower bound for the penalty factor I replaced T by a minimal wrapper, just storing a double and an additional native pointer and forwarding the operators +,*,+= in question here to the respective double operations. With this wrapper the penalty is 2.0x. This is much better. The perfect operator overloading tool will probably lie somewhere between 2.0 and 11.7. But even 2.0 is not good, if you can avoid it, is it?

In any case the template approach allows for selective activation of variables where AD is needed and use native instances of the library classes elsewhere. You can also mix both. For example you could do a model calibration with the double-instance of a class, then extract the calibrated parameters and plug them into the AD – version of the same model, which then maybe generates Monte Carlo paths for some CVA calculations in which you want to calculate an adjoint gradient.

Still there are use cases left where even the template approach will not help and this is when you want to apply AD to code sections like the one above, where a lot of dense double calculations take place, enabling the compiler to apply optimizations that are not possible with custom types.

We saw an example in https://quantlib.wordpress.com/2015/04/14/adjoint-greeks-iv-exotics. The time equivalent of around 80 NPV calculations (holy cow!) is consumed for one adjoint gradient calculation of a Bermudan swaption in the Hull White model. From theory we would expect a complexity of 4x for the adjoint sweep, in practice we are probably happy with any factor up to 10x. But with 80 NPV calculations you can do a lot of things and even if your gradient vector has 100 – 200 entries, the application of AD would merely result in a speed up of around 2x compared to a naive bump and revalue approach. This would not really justify the whole AD-thing.

One possible way out of this is to write the adjoint code by hand.

Another approach is to use source code transformation. I started with a toy example in one of my last posts.

In the meantime I implemented a Bermudan swaption engine that produces an AD gradient w.r.t. the H and \zeta vectors describing an LGM model and the relevant discount factors of an interest rate curve. For this I extracted the computational core of the Gaussian1dSwaptionEngine (which relies on numerical integration) and rewrote it in Fortran. Then I applied OpenAD/F to the code to generate adjoint code and compiled everything into a library. You can have a look at the project here: https://github.com/pcaspers/quantlib/tree/master/QuantLibOAD/lgm.

On the QuantLib side I wrote a pricing engine acting as a wrapper for the Fortran core: It can be used just as any other pricing engine, but provides the adjoint gradient vector as an additional result. The gradient is computed w.r.t. the model parameters, but it is easy to tranform this into the usual market variables’ sensitivities. This is done exactly the same way as in the post I mentioned earlier.

Testing the engine on a 10y Bermudan swaption with yearly calls (using 49 grid points per expiry for the numerical integration) yields the following results (you can find the example code here)

  • single pricing on a Intel(R) Core(TM) i7-2760QM CPU @ 2.40GHz, single threaded, non-transformed code: 4.2 ms
  • pricing + gradient (which is 105 dimensional in this example): 25.6 ms
  • transformation of gradient w.r.t. model parameters: 6.2 ms
  • adjoint calculation multiple: 6.1x (7.6x including gradient transformation)

Looks good, no? In particular, because no extra optimization of the input or transformed code was necessary to achieve the good multiple of 6.1, it just came out of OpenAD/F this way.

As a side note when implementing it as described here you can use the power of AD even from the original / “classic” QuantLib just through specialized pricing engines (and by linking against the Fortran support library of course). Or you can mix operator overloading in the template – QuantLib with source code transformation applied to critical sections of the code which slow down too much with operator overloading. Maybe this hybrid solution is the way to go.

More on this and other topics in Düsseldorf.

Adjoint Greeks VI – Source Code Transformation Pt. II

Adjoint Greeks V – Source Code Transformation

In my last post on automatic differentiation we saw that the operator overloading approach in some situations (quite naturally) comes with a dramatic performance breakdown.

An alternative approach that might do better is called source code transformation. Here the code is differentiated at compile time. While there are quite a few operator overloading tools that work on existing code bases more or less out of the box, this does not seem to be the case for source code transformation tools.

I looked around quite a while but did not really find a mature framework for C++. There is one under development (ADIC 2.0) that should be mentioned. There is also quite an old version of this same project for C (ADIC 1.x).

There is a Fortran framework though called OpenAD/F. Quite interestingly this framework separates the abstract ad engine (“OpenAD”) from language specific parts (“/F”), using a XML representation for the computational core to be differentiated. So in princinple you can implement a front end for C++ replacing the Fortran front end and reuse the algorithms of OpenAD. This is actually what ADIC is doing.

Still I have this faint idea of using the intermediate representation (IR) of the LLVM compiler infrastructure to interface to OpenAD. There are already backends that can translate back IR to C, so the “only” thing left to do would roughly be the transformation of IR to XAIF and back and the implementation of a run time support library. Very roughly. I certainly do not have enough experience in both AD and the IR to really overlook this and also not the time to delve into this. Anyway.

I played around with OpenAD/F some months ago. I solved a toy pde for american option pricing with Fortran code written from scratch. The results were promising in terms of performance. So here is the plan how to benefit from that in a productive setting:

The idea is that a typical core pricing algorithm is actually not too big and can effectively be excavated from the rest of the code.

If this is the case we could reimplement this computational core in Fortran, apply OpenAD/F to differentiate it and use the differentiated code via a plain interface from QuantLib pricing engines to retrieve adjoint greeks.

This post aims to present a minimal example for that procedure to prove the technical feasibility. Later I will try to apply this to the Bermudan Swaption example I mentioned above.

The libraries are organized as the usual QuantLib shared object library and a Fortran shared object library that contains the differentiated code for the numerical cores. The minimal example library is called simplelibad (simple lib ad). An application will then typcially be an usual C++ program linking against QuantLib and the new ad library.

You can find the example code here. It comes with a make file that by default builds a differentiated version of the library functions. You can also run the target plain to build only the original functions without AD, which seems useful for testing the ported code against the original one before doing anything fancy.

Actually only one function is provided in our simplelib wich computes a discount factor from a given continuously compounded zero yield and a day count fraction. The coding looks as follows (sorry, no syntax highlighting)

subroutine discount(zeroyield, dcf, df)
  implicit none
  double precision:: zeroyield, dcf, df
  !$openad INDEPENDENT(zeroyield)
  df = exp(-zeroyield * dcf)
  !$openad DEPENDENT(df)
end subroutine discount

To keep the interface simple no return parameters are present, the result is written to the last input parameter instead.

There are two OpenAD – specific lines starting with !$openad which declare the active variables. The make file invokes a wrapper script openad that is shipped with OpenAD as a simple driver for the actual tools

openad -c -m f simplelib.f90

This produces the differntiated version of the above code. The option -m f sets the mode to forward. The resulting code is then compiled into an object file

gfortran -g -O3 -o simplelib.pre.xb.x2w.w2f.post.o -c simplelib.pre.xb.x2w.w2f.post.f90 -fpic

The same is done with some additional runtime support files provided by OpenAD

gfortran -g -O3 -o w2f__types.o -c w2f__types.f90 -fpic
gfortran -g -O3 -o OAD_active.o -c OAD_active.f90 -fpic

Finally we need a driver that acts as the interface to QuantLib later on

subroutine discount_ad(zeroyield, dcf, df, ddf)
  use OAD_active
  implicit none
  external discount
  double precision:: zeroyield, dcf, df, ddf
  type(active):: zeroyield_ad, res
  zeroyield_ad%v = zeroyield
  zeroyield_ad%d = 1.0
  call discount(zeroyield_ad, dcf, res)
  df = res%v
  ddf = res%d
end subroutine discount_ad

This returns the derivative together with the original result of the computation and does nothing more than invoking the differentiated code. We compile the driver as well

gfortran -g -O3 -o driver_simplelib.o -c driver_simplelib.f90 -fpic

and then everything is linked into a shared object library simplelibad

gfortran -shared -g -O3 -o libsimplelibad.so w2f__types.o OAD_active.o driver_simplelib.o simplelib.pre.xb.x2w.w2f.post.o

In the minimal example here we actually do not use any QuantLib classes, but directly talk to the ad library from our example application, like this

#include <iostream>
#include <cmath>

extern "C" void discount_ad_(double *zeroyield, double *dcf, double *df,
                             double *ddf);

int main() {

    double zeroyield = 0.02;
    double dcf = 10.0;
    double df, ddf;

    discount_ad_(&zeroyield, &dcf, &df, &ddf);

    std::cout << "result1 = " << df
              << " should be: " << std::exp(-zeroyield * dcf) << std::endl;

    std::cout << "result2 = " << ddf
              << " should be: " << -dcf * std::exp(-zeroyield * dcf)
              << std::endl;
}

The application can be compiled as any other C++ code, it just needs to be linked against simplelibad. Running the code yields the expected output

result1 = 0.818731 should be: 0.818731
result2 = -8.18731 should be: -8.18731

Now of course the real work starts … hope to be back with a meaningful example soon.

Adjoint Greeks V – Source Code Transformation