Test Coverage

Since release 1.6 a code coverage report for QuantLib’s unit test suite is provided. While for 1.6 and 1.7 the Intel compiler suite was used to generate it, the report for the current release 1.8 was generated using lcov / gcov for the first time, following these steps:

  • configure with the coverage flag, switching optimizations off, debug symbols on and disabled inlining: ./configure CXXFLAGS="--coverage -O0 -g -fno-inline -fno-inline-small-functions -fno-default-inline" LDFLAGS="--coverage"

  • build the library and test-suite, first by cleaning up (make clean) followed by something like make -k -j8 -w

  • generate a baseline info file: lcov --no-external -c -i -d . -o ~/ql_base.info

  • run the test-suite (this takes a bit due to the missing optimizations): test-suite/quantlib-test-suite

  • generate the test run’s info file: lcov --no-external -c -d . -o ~/ql_test.info

  • merge the baseline and test run’s info files: lcov -a ~/ql_base.info -a ~/ql_test.info -o ~/ql_total.info

  • remove the test suite code itself from the coverage report: lcov --remove ~/ql_total.info "/test-suite/*" -o ~/ql_total_purged.info

  • generate the html site: genhtml ~/ql_total_purged.info --demangle-cpp --output-directory ~/ql_coverage_report

Test Coverage

QuantLib User Meeting 2015

I gave two talks at this year’s QuantLib User Meeting. They were on

  • Automatic differentiation beyond typedef and operator overloading and
  • Negative rates in QuantLib.

The slides can be downloaded from the event’s site.

You can read Luigi’s full report on the conference here.

luigis_hat2

QuantLib User Meeting 2015

Laplace Interpolation

Consider a (two dimensional) data matrix with some values missing and you want to fill the holes by interpolated values. One particularly simple but at the same time powerful method is called Laplace interpolation.

The easy explanation goes as follows: for each missing value take the average over the 4 surrounding values, i.e. of the entries above, below, left and right. This holds for the inner points of the matrix, special rules apply to the borders, see below. Of course, maybe some or even all of the surrounding values are missing, too. In general this leads to a system of linear equations one has to solve.

The more involved explanation: search for a harmonic function interpolating the non-missing data. A harmonic function f(x,y) satisfies

\left(\frac{\partial^2}{\partial x^2} + \frac{\partial^2}{\partial y^2}\right) f = 0

from which the mean value property follows. This says that each function value is the average of the function values on or in a surrounding sphere S

f(x,y) = \frac{1}{2\pi r} \int_{\partial S} f(x,y) ds = \frac{1}{\pi r^2} \int_{S} f(x,y) d(x,y)

where S = \{ (x',y') | (x'-x)^2+(y'-y)^2 < r \}. That's deeper math of course. To find the harmonic function we have to solve its defining partial differential equation, which we discretize on the grid given by the data matrix itself assuming equidistant nodes (with distance say h). For an inner point we have

\frac{f(x+h,y)-2f(x,y)+f(x-h,y)}{h^2} + \frac{f(x,y+h)-2f(x,y)+f(x,y-h)}{h^2} = 0

which is the same as

f(x,y) - \frac{f(x+h,y)+f(x-h,y)+f(x,y-h)+f(x,y+h)}{4} = 0

matching the easy explanation from the beginning. For each inner point of the matrix we either get this averaging equation, or if the data point is known to be z=z(x,y) simply

f(x,y) = z(x,y)

This way we collect nm equations, if the original data matrix has m rows and n columns. The borders have modified equations, which I do not list here explicitly, but which are based on the conditions f_{xx} = 0 for the upper and lower border, f_{yy}=0 for the left and right border and f_x+f_y=0 for the four corners.

Here is a tentative implementation in QuantLib using the BiCGStab solver at its heart.

Let’s do some examples (the code can be found here, together with a gnuplot script generating the pictures.

The original data is generated by filling a 50\times 50 matrix sample like this

    for (Size i = 0; i < N; ++i) {
        for (Size j = 0; j < N; ++j) {
            sample(i, j) =
                sin((double)(i + 1) / N * 4.0) * cos((double)(j + 1) / N * 4.0);
        }
    }

By the way, the function generating the test data is not harmonic. Plotted as a heatmap it looks like this:

laplace_orig

Now we delete 20\% of the points, randomly.

laplace_des_random_20

We reconstruct the missing values by applying Laplace interpolation, which is done as easy as this

    laplaceInterpolation(sample2);

where sample2 is the destructed matrix with missing values marked as Null. The function laplaceInterpolation replaces these values by interpolated ones and writes them back to the input matrix sample2.

The reconstructed matrix looks is this:

laplace_rec_random_20

As beautiful as the original, isn’t it. Let’s delete more points and look and the reconstruction.

For 50\% we get

laplace_des_random_50

laplace_rec_random_50

for 80\% it is

laplace_des_random_80

laplace_rec_random_80

for 90\%

laplace_des_random_90

laplace_rec_random_90

and for 95\%

laplace_des_random_95

laplace_rec_random_95

Quite amazing. Let’s look at how well the method works if we do not delete points randomly, but erase a whole stripe at the center of the matrix.

If the stripe is 10\% of the height

laplace_des_stripecenter_10

laplace_rec_stripecenter_10

for 20\%

laplace_des_stripecenter_10

laplace_rec_stripecenter_10

and for 50\%

laplace_des_stripecenter_50

laplace_rec_stripecenter_50

What about extrapolation? So let’s delete a stripe at the top of the matrix.

For 10\% we get

laplace_des_striptop_10

laplace_rec_striptop_10

for 20\%

laplace_des_striptop_20

laplace_rec_striptop_20

and for 50\%

laplace_des_striptop_50

laplace_rec_striptop_50

Of course the method can not see what is no longer there. Still it is working surprisingly well if you ask me.

Laplace Interpolation

Adjoint Greeks VI – Source Code Transformation Pt. II

I will be talking about Automatic differentiation beyond typedef and operator overloading at this year’s QuantLib User Conference. Registration is free, but seats are limited. The blog post today is sort of a teaser, so do not miss to be in Düsseldorf on November 30th / December 1st.

To recall what has been done to make QuantLib AD-ready: We have two approaches. One is by CompatibL which in essence changes the typedef for Real in the library from double to some active type of an operator overloading tool, like CppAD::AD. Of course this doesn’t work just by relinking the typedef, but in effect it describes well what is going on.

The other approach is to transform the library’s code and introduce template parameters whenever we might potentially be interested in calculating derivatives. You can read more about this by selecting the tag Automatic Differentiation in the sidebar that can be expanded by clicking on the three horizontal lines in the upper right corner when you scroll to the top of this page.

Backward compatibility in the template approach is achieved by typedef’ing the original class names to template instantiations with T = Real, so existing code compiles without any changes. Well let’s say 99% of it, some few modifications have to be applied, but really not many (from my experience with the original examples and the test-suite).

The transformation is not applied to the full library (yet), but the vanilla IRD stuff works, so coupons, yield term structures, volatility termstructures, the GSR model. You can get it (together with some working examples) from my adjoint branch at https://github.com/pcaspers/quantlib/tree/adjoint.

Both approaches have clear advantages and disadvantages. One potential problem with the typedef approach is, that this might kill your performance in parts of the code where you do not even want to use AD. For example,

    Matrix_t<T> A(1024, 1024);
    Matrix_t<T> B(1024, 1024);
    ...
    Matrix_t<T> C = A * B;

will take 764 ms on my machine with T=double, but 8960 ms with T=CppAD::AD. This seems to be due to compiler optimizations (like using SIMD instructions) that are no longer applicable when switching from native double to the custom AD type. This is a penalty of 11.7x and you do not get anything for that. Note that I do not record any tape or something, I just use the double functionality wrapped in the AD type.

Of course there might be other operator overloading tools that are more efficient. To get a lower bound for the penalty factor I replaced T by a minimal wrapper, just storing a double and an additional native pointer and forwarding the operators +,*,+= in question here to the respective double operations. With this wrapper the penalty is 2.0x. This is much better. The perfect operator overloading tool will probably lie somewhere between 2.0 and 11.7. But even 2.0 is not good, if you can avoid it, is it?

In any case the template approach allows for selective activation of variables where AD is needed and use native instances of the library classes elsewhere. You can also mix both. For example you could do a model calibration with the double-instance of a class, then extract the calibrated parameters and plug them into the AD – version of the same model, which then maybe generates Monte Carlo paths for some CVA calculations in which you want to calculate an adjoint gradient.

Still there are use cases left where even the template approach will not help and this is when you want to apply AD to code sections like the one above, where a lot of dense double calculations take place, enabling the compiler to apply optimizations that are not possible with custom types.

We saw an example in https://quantlib.wordpress.com/2015/04/14/adjoint-greeks-iv-exotics. The time equivalent of around 80 NPV calculations (holy cow!) is consumed for one adjoint gradient calculation of a Bermudan swaption in the Hull White model. From theory we would expect a complexity of 4x for the adjoint sweep, in practice we are probably happy with any factor up to 10x. But with 80 NPV calculations you can do a lot of things and even if your gradient vector has 100 – 200 entries, the application of AD would merely result in a speed up of around 2x compared to a naive bump and revalue approach. This would not really justify the whole AD-thing.

One possible way out of this is to write the adjoint code by hand.

Another approach is to use source code transformation. I started with a toy example in one of my last posts.

In the meantime I implemented a Bermudan swaption engine that produces an AD gradient w.r.t. the H and \zeta vectors describing an LGM model and the relevant discount factors of an interest rate curve. For this I extracted the computational core of the Gaussian1dSwaptionEngine (which relies on numerical integration) and rewrote it in Fortran. Then I applied OpenAD/F to the code to generate adjoint code and compiled everything into a library. You can have a look at the project here: https://github.com/pcaspers/quantlib/tree/master/QuantLibOAD/lgm.

On the QuantLib side I wrote a pricing engine acting as a wrapper for the Fortran core: It can be used just as any other pricing engine, but provides the adjoint gradient vector as an additional result. The gradient is computed w.r.t. the model parameters, but it is easy to tranform this into the usual market variables’ sensitivities. This is done exactly the same way as in the post I mentioned earlier.

Testing the engine on a 10y Bermudan swaption with yearly calls (using 49 grid points per expiry for the numerical integration) yields the following results (you can find the example code here)

  • single pricing on a Intel(R) Core(TM) i7-2760QM CPU @ 2.40GHz, single threaded, non-transformed code: 4.2 ms
  • pricing + gradient (which is 105 dimensional in this example): 25.6 ms
  • transformation of gradient w.r.t. model parameters: 6.2 ms
  • adjoint calculation multiple: 6.1x (7.6x including gradient transformation)

Looks good, no? In particular, because no extra optimization of the input or transformed code was necessary to achieve the good multiple of 6.1, it just came out of OpenAD/F this way.

As a side note when implementing it as described here you can use the power of AD even from the original / “classic” QuantLib just through specialized pricing engines (and by linking against the Fortran support library of course). Or you can mix operator overloading in the template – QuantLib with source code transformation applied to critical sections of the code which slow down too much with operator overloading. Maybe this hybrid solution is the way to go.

More on this and other topics in Düsseldorf.

Adjoint Greeks VI – Source Code Transformation Pt. II

Dynamics II

This is a sequel to the recent post on smile dynamics, generalizing things a bit. It’s just a stream of thoughts, nothing that is actually backed by own experience or anything. Hopefully it makes sense nevertheless.

The PL of a interest rate derivative portfolio reacts on movements in market data like broker quotes for FRAs, Euribor futures, swaps, implied volatilities for caps / floors and swaptions. These quotes are processed into rate curves and volatility surfaces, which are then fed into pricing models which value the portfolio and thereby produce its PL.

A trader is interested in how his portfolio reacts on market movements, i.e. in its greeks, which are by definition partial derivatives of the portfolio value by the single market data inputs. So a portfolio may for example be particularly sensitive on the 10y and 15y Euribor 6M swap quotes and on the 5y/15y and 10y/10y swaption volatilities. The trader would see a large interest rate delta on these two buckets of the 6M curve and a large vega in these two cells of the vega matrix for swaptions.

The input variables are by far not independent. For example if the 10y swap quote rises by 10 basis points, it is likely that the 15y swap quote rises too, and often even roughly by the same amount. Similarly, movements of the cells of implied volatility surfaces are correlated, in particular if their distance is not too big. Even more, an interest rate move will possibly affect the implied volatility, too, in a systematic way: If rates rise, typically lognormal volatilities will decrease, so to roughly keep the product of the forward level and the lognormal volatility (which is by definition the so called (approximate) basispoint volatility) constant. The dependency is obviously dependent on the nature of the volatility, so it will be quite different for normal implied volatilities.

Furthermore we have technical dependencies. Say you work with sensitivites on zero rates instead of market quotes. Then a movement in the EONIA zero rates will affect the Euribor 6M forward curve, because this curve is bootstrapped using the EONIA curve as a discounting curve. So a sensitivity on the EONIA curve might come from a direct dependency of a deal on the EONIA curve, or also from an indirect dependency via forward curves.

So the abstract picture is that you have tons of correlated data points as an input, a very long gradient vector describing the sensitivity of the PL to the input, and maybe also a big Hessian matrix describing second order effects.

Wouldn’t it be reasonable to apply a principal component analysis first, select the most important risk drivers and only manage the sensitivities on these drivers ? Conceptually this is nothing more that a suitable rotation of coordinates such that the new coordinates are independent. Once this is done you can sort the dimensions by the ratio of the total variance of the input data they carry and focus on the most important ones. Which may be only 5 to 10 maybe instead of hundreds on input data points.

Hedging would be be more efficient, because you could hedge the same amount of PL variance with fewer hedge positions. The main risks drivers could be summarized in terms of only a few numbers for management reporting. Sensitivity limits can be put on these main drivers. PL could be explained accurately by movements and sensitivites of only a few quantities. Stress tests could be formulated directly in the new coordinates, yielding more realistic total scenarios, since natural co-movements of other variables not directly stressed are automatically accounted for (e.g. huge rate shifts would imply appropriate adjustments to volatilities and keep them realisitic).

If you think about it, the kind of interest rate delta you see for swaptions and caps / floors this way would include a historically calibrated smile dynamics. Which you could compare to the dynamics we discussed last time. For example.

Also, typically the new coordinates allow for a natural interpretation like parallel movements of curves, rotations and so on, so the view wouldn’t necessarily be more abstract than the original one.

What is needed is the coordinate transformation layer, ideally directly in the front office system, so that it can be used in real-time. Plus a data history has to be collected and maintained, so that the principal components can be updated on a regular basis.

Dynamics II

Adjoint Greeks V – Source Code Transformation

In my last post on automatic differentiation we saw that the operator overloading approach in some situations (quite naturally) comes with a dramatic performance breakdown.

An alternative approach that might do better is called source code transformation. Here the code is differentiated at compile time. While there are quite a few operator overloading tools that work on existing code bases more or less out of the box, this does not seem to be the case for source code transformation tools.

I looked around quite a while but did not really find a mature framework for C++. There is one under development (ADIC 2.0) that should be mentioned. There is also quite an old version of this same project for C (ADIC 1.x).

There is a Fortran framework though called OpenAD/F. Quite interestingly this framework separates the abstract ad engine (“OpenAD”) from language specific parts (“/F”), using a XML representation for the computational core to be differentiated. So in princinple you can implement a front end for C++ replacing the Fortran front end and reuse the algorithms of OpenAD. This is actually what ADIC is doing.

Still I have this faint idea of using the intermediate representation (IR) of the LLVM compiler infrastructure to interface to OpenAD. There are already backends that can translate back IR to C, so the “only” thing left to do would roughly be the transformation of IR to XAIF and back and the implementation of a run time support library. Very roughly. I certainly do not have enough experience in both AD and the IR to really overlook this and also not the time to delve into this. Anyway.

I played around with OpenAD/F some months ago. I solved a toy pde for american option pricing with Fortran code written from scratch. The results were promising in terms of performance. So here is the plan how to benefit from that in a productive setting:

The idea is that a typical core pricing algorithm is actually not too big and can effectively be excavated from the rest of the code.

If this is the case we could reimplement this computational core in Fortran, apply OpenAD/F to differentiate it and use the differentiated code via a plain interface from QuantLib pricing engines to retrieve adjoint greeks.

This post aims to present a minimal example for that procedure to prove the technical feasibility. Later I will try to apply this to the Bermudan Swaption example I mentioned above.

The libraries are organized as the usual QuantLib shared object library and a Fortran shared object library that contains the differentiated code for the numerical cores. The minimal example library is called simplelibad (simple lib ad). An application will then typcially be an usual C++ program linking against QuantLib and the new ad library.

You can find the example code here. It comes with a make file that by default builds a differentiated version of the library functions. You can also run the target plain to build only the original functions without AD, which seems useful for testing the ported code against the original one before doing anything fancy.

Actually only one function is provided in our simplelib wich computes a discount factor from a given continuously compounded zero yield and a day count fraction. The coding looks as follows (sorry, no syntax highlighting)

subroutine discount(zeroyield, dcf, df)
  implicit none
  double precision:: zeroyield, dcf, df
  !$openad INDEPENDENT(zeroyield)
  df = exp(-zeroyield * dcf)
  !$openad DEPENDENT(df)
end subroutine discount

To keep the interface simple no return parameters are present, the result is written to the last input parameter instead.

There are two OpenAD – specific lines starting with !$openad which declare the active variables. The make file invokes a wrapper script openad that is shipped with OpenAD as a simple driver for the actual tools

openad -c -m f simplelib.f90

This produces the differntiated version of the above code. The option -m f sets the mode to forward. The resulting code is then compiled into an object file

gfortran -g -O3 -o simplelib.pre.xb.x2w.w2f.post.o -c simplelib.pre.xb.x2w.w2f.post.f90 -fpic

The same is done with some additional runtime support files provided by OpenAD

gfortran -g -O3 -o w2f__types.o -c w2f__types.f90 -fpic
gfortran -g -O3 -o OAD_active.o -c OAD_active.f90 -fpic

Finally we need a driver that acts as the interface to QuantLib later on

subroutine discount_ad(zeroyield, dcf, df, ddf)
  use OAD_active
  implicit none
  external discount
  double precision:: zeroyield, dcf, df, ddf
  type(active):: zeroyield_ad, res
  zeroyield_ad%v = zeroyield
  zeroyield_ad%d = 1.0
  call discount(zeroyield_ad, dcf, res)
  df = res%v
  ddf = res%d
end subroutine discount_ad

This returns the derivative together with the original result of the computation and does nothing more than invoking the differentiated code. We compile the driver as well

gfortran -g -O3 -o driver_simplelib.o -c driver_simplelib.f90 -fpic

and then everything is linked into a shared object library simplelibad

gfortran -shared -g -O3 -o libsimplelibad.so w2f__types.o OAD_active.o driver_simplelib.o simplelib.pre.xb.x2w.w2f.post.o

In the minimal example here we actually do not use any QuantLib classes, but directly talk to the ad library from our example application, like this

#include <iostream>
#include <cmath>

extern "C" void discount_ad_(double *zeroyield, double *dcf, double *df,
                             double *ddf);

int main() {

    double zeroyield = 0.02;
    double dcf = 10.0;
    double df, ddf;

    discount_ad_(&zeroyield, &dcf, &df, &ddf);

    std::cout << "result1 = " << df
              << " should be: " << std::exp(-zeroyield * dcf) << std::endl;

    std::cout << "result2 = " << ddf
              << " should be: " << -dcf * std::exp(-zeroyield * dcf)
              << std::endl;
}

The application can be compiled as any other C++ code, it just needs to be linked against simplelibad. Running the code yields the expected output

result1 = 0.818731 should be: 0.818731
result2 = -8.18731 should be: -8.18731

Now of course the real work starts … hope to be back with a meaningful example soon.

Adjoint Greeks V – Source Code Transformation

Smile Dynamics by Densities

The term smile dynamics refers to a rule how an implied volatility smile behaves when the underlying moves. This rule can be estimated from actual historical data, implied by a pricing model or set up from scratch.

Today I am looking at some common specifications, but from a slightly different angle. I am not looking at the effect on the implied volatility smile, but rather on the underlying’s density in its natural pricing measure.

This can be done using the Breeden-Litzenberger Theorem which links the density \phi of the underlying to non deflated call prices c for options struck at strike k by

\phi(k) = \frac{\partial^2 c}{\partial k^2} (k)

The idea is to have a more natural look at such dynamics – while implied volatility may be something we think we have an intuition for, a density seems to be a somewhat more grounded description of what is actually going on.

I am looking at an underlying from the interest rate world with forward level 3\% and options on this underlying expiring in 5 years. Think of these as european swaptions for example. I am assuming an implied volatility smile generated by the SABR model, in terms of lognormal volatilities. The observations easily translate to shifted lognormal volatility smiles, as they are currently used for some currencies due to the low rates.

The question is now, what happens to a smile if the underlying’s forward level rises to say 3.5\% or 4\% ? Nobody really “knows” this of course. But you have to make an assumption whenever you look at effects resulting from rate shifts. This assumption can be crucial for the effectiveness of your delta hedges. It is also of paramount importance for the specification of stress tests, where bigger rate shifts are applied. This will become clear shortly.

I am looking at five different rules for the smile dynamics:

  • Sticky strike dynamics, meaning that after the rate shift the volatilities remain the same for each fixed strike
  • Sticky absolute moneyness assigning the original volatility at strike k-s to the strike k if the rate shift is s
  • Sticky model keeping the (SABR) model parameters constant, i.e. the volatilities after the shift are computed with the same SABR parameters as before, except for the forward, which is now f+s instead of f before
  • Sticky approximate basispoint volatility keeping the approximate basispoint volatility, defined as the product f\cdot\sigma of the forward f and the implied lognormal volatility \sigma, constant. At the same time the volatility smile is moved in absolute moneyness space, meaning that the (lognormal) volatility at strike k is f\cdot\sigma_{k-s} / (f+s) after the rate shift.
  • Sticky basispoint volatility keeping the exact basispoint volatility constant, defined as the volatility parameter of the normal Black model, in our case matching a price produced by the lognormal Black model. Again, the price at k-s is relevant for strike k after a shift by s.

Obviously, one would need some extrapolation rule for absolute monenyness and approximate basispoint volatility dynamics, for strikes below the shift size. We just ignore this here and draw the pictures starting at s instead, this is interesting enough for the moment. Also note that for the sticky basispoint volatility dynamics for strikes below s the implied basispoint volatility is zero.

Let’s start with a SABR smile given by parameters \alpha=0.2, \beta=1.0, \nu=0.0 (, \rho=0.0). This is simply a Black76 model, so we have a flat, lognormal smile. It shouldn’t come as a surprise that in this case three of the five dynamics produce the same result. For sticky strike, sticky moneyness and sticky model we have the following underlying densities for the original rate level, the shift to 3.50\% and the shift to 4.00\%.

dyn_1_1

The densities keep their lognormal nature and since the log volatility remains the same the effective absolute variance of the underlying rates get bigger when rates rise. Under big upshifts coming from a low initial rate this can lead to much too fat, unrealistic densities.

The sticky approximate basispoint variant produces a different result

dyn_1_2

keeping the lognormal nature, but adjusting down the log volatility for higher rates such that the absolute variance of the rates stays more (but not totally) constant than in the other alternatives, which is probably more realistic under big upshifts.

I do not have a picture for the last dynamics (sticky exact basispoint), but it is clear, that this dynamics actually just translates the density by s, preserving the shape exactly.

The second exapmle is again without stochastic volatlility (i.e. a pure CEV example, \nu=0), but now with \alpha=0.015, \beta=0.3 producing a skew

dyn_2_0

Sticky strike,

dyn_2_1

and sticky model

dyn_2_2

lead to densities again with slightly increasing absolute variance, while sticky approximate basispoint

dyn_2_3

roughly keeps the absolute variances constant. However there are some artefacts arising for low strikes. This is because formerly rather high volatilities belonging to low strikes are now applied to higher strikes. This is an obvious problem of this dynamics.

Sticky aboslute moneyness has similar problems

dyn_2_4

at the same time leading to higher absolute variances for higher rates.

Again, sticky exact basispoint volatility will just translate the density.

Now let’s add stochastic volatility and look at \alpha=0.015, \beta=0.30, \nu=0.50, \rho=0.2. The smile now looks like this

dyn_3_0

Sticky strike,

dyn_3_1

sticky model,

dyn_3_2

sticky approximate basispoint

dyn_3_3

show a similar qualitative behaviour as for the CEV case, the approximate basispoint dynamics again with the artefacts for low strikes. Similarly for sticky absolute moneyness

dyn_3_4

elsewhere keeping the absolute variance of rates quite the same across the shifts.

Sticky exact basispoint will again translate the original density by s.

What kind of dynamics matches reality best, is in the end subject to a statistical check on historical data (with all the difficulties in that procedure). Or what traders consider to be the most appropriate dynamics. If I had to choose from the alteratives above, sticky (exact) basispoint or sticky model seem most reasonable.

Smile Dynamics by Densities

Supernatural Libor Coupons

Natural Libor coupons in swaps are fixed at (or usually some few days before) the start of their accrual period while the payment is made at the end of that period. Sometimes the fixing is delayed until the end of the period, so that the payment is done only a few days after the fixing. Such coupons are said to be fixed in arrears. What seems to be a slight modification of the terms distinguishes a simple case of model independent reading-the-price-off-the-forward curve from a highly exotic and model dependent endeavor in pricing and hedging.

The exact criterion for being a natural coupon boils down to the equality of the index estimation end date and the payment date. Everything else (accrual start and end date, fixing date) does not matter. Note that all these dates are in general different. In particular the index estimation end date does not need to coincide with the accrual end date (the latter being usually identical to the payment date). This is due to the swap schedule construction where an intermediate accrual start date may be moved a few days forward due to non-business days adjustments while the corresponding accrual end date falls into the regular schedule, which then causes the index estimation end date lying a few days after the accrual end date.

While in principle all cases with index estimation end date not equal to the payment date require a so called timing adjustment for pricing, it is only significant in cases where there is a bigger lag between the two dates.

Either case is relevant: While there are swaps with in arrears fixings, i.e. a payment long before the index end date, or average rate swaps (corresponding to a basket of coupons fixed somewhere between in advance and in arrears) there are also delayed payments occurring in some adjustable rate mortgages loans (I seem to remember that it is perfectly usual for Spanish mortgages, but I am not totally sure about that at the moment …).

Pricing models range from simple adjustment formulas coming from simple Black style models over replication approaches taking into account the whole relevant caplet smile using some terminal rate model (similar to the well known replication models for CMS coupons) up to full term structure model implied adjustments.

Where does QuantLib stand. Currently quite at the baseline, we only have a simple Black76 adjustment for the case of in arrears fixings, which is implemented in BlackIborCouponPricer::adjustedFixing(). Nothing in place for delayed payments or for fixings somewhere between in advance and in arrears.

Some time ago I wrote down the formula which generalizes the simple Black76 in arrears adjustment to the case of an arbitrary configuration of index end and payment date. You can find the paper here. I am not going into the derivation here, it’s better to read that for oneself in a lonely hour. Instead I will visualize an example.

The implementation is QuantLib of the formulas is straight forward, you can see the relevant code here.

The following graph shows the adjustments for an Euribor 6m coupon on evaluation date 24-08-2015 with fixing on 24-08-2025, 2\% forward rate and associated caplet volatility of 40\% (lognormal, no shift). The payment date slides between 26-08-2025 (t \approx 10) and 26-08-2026 (t \approx 11) and for each we compute the corresponding timing adjustment. The index estimation period end date is 26-02-2026 (t \approx 10.5). Remember that this is also the natural payment time with zero timing adjustment.

For payments before the natural payment time 26-02-2026 the adjustment is positive, for delayed payments negative, ranging between plus/mins three basis points approximately. We can introduce a correlation parameter \rho (see the paper for the model behind). Normally the parameter will be close to one, but for very large delays it may also be sensible to choose a correlation below one. However for consistency is must be ensured that \rho \rightarrow 1 if the payment time approaches the natural payment time from the left (again see the paper for the rationale behind). Therefore we set

\rho(\tau) = \rho_0 + (1-\rho_0) e^{-10\tau}

with \tau defined to be the year fraction between the payment time and the natural payment time. For payment times after the natural payment time we just assume a constant \rho  = \rho_0.

timing

Supernatural Libor Coupons

Ultra Strong Typing

Picture a library. QuantLib. Imagine it has an overloaded function

void f(unsigned int a) {}
void f(unsigned int a, unsigned int b) {}

which is used like this in client code

f(0);
f(0,0);

Now you want to extend the function adding an additional parameter. For backward compatibility you give it a default value.

void f(unsigned int a, double t = 0.0) {}
void f(unsigned int a, unsigned int b, double t = 0.0) {}

Compiling the client code above results in a compiler complaint, g++ for example says

testoverload.cpp:6:10: error: 
   call of overloaded ‘f(int, int)’ is ambiguous
     f(0,0);
          ^
testoverload.cpp:1:6: note: 
   candidate: void f(unsigned int, double)
 void f(unsigned int a, double t = 0.0) {}
      ^
testoverload.cpp:2:6: note: 
   candidate: void f(unsigned int, unsigned int, double)
 void f(unsigned int a, unsigned int b, double t = 0.0) {}
      ^

This is because the arguments 0 are int‘s, so not directly matching any signature exactly. Therefore the compiler tries to convert the arguments to match one of the signatures. However int can both be converted to unsigned int and double, which causes the ambiguity. One can resolve the error by changing the call to

f(0,0.0);

or

f(0u,0u);

or even

f(0.0,0.0)

or

f(0.0,0u)

all of which have one signature strictly better matching than the other, since only requiring conversion of at most one (instead of both) of the arguments while the other is already exactly matching.

Does that help ? No, since the new code should work with existing client code out of the box. That is the whole point of backward compatibility.

What we can do is to apply the SFINAE technique I wrote about in an earlier post. We make the second parameter a template parameter

template<class T>
void f(unsigned int a, T t = T(0.0)) {}

template<class T>
void f(unsigned int a, T b, double t = 0.0) {}

and then use meta programming to switch the function definition on and off depending on the type T

#include <boost/utility/enable_if.hpp>
#include <boost/type_traits/is_float.hpp>
#include <boost/type_traits/is_integral.hpp>

template<class T>
void f(unsigned int a, T t = T(0.0), 
 typename boost::enable_if<
                 boost::is_float<T> >::type* = 0 ) {}

template<class T>
void f(unsigned int a, T b, double t = 0.0, 
 typename boost::enable_if<
                 boost::is_integral<T> >::type* = 0) {}

Remember that boost::enable_if<b>::type is either void (when b is true, i.e. when T is a float or integral type respectively) or nothing. If it is nothing it causes a potential compiling error.

However since we are in the process of trying template substitutions here, the whole function definition is just silently discarded from the set of possible template incarnations (“substitution failure is not an error”, SFINAE). What is left is the one function definition we actually want depending on T.

The code then safely chooses the upper definition if we pass a float type like double or float as the second parameter and the lower one for an integral type like int, unsigned int, std::size_t and so on.

Quite heavy machinery to solve an easy problem.

Ultra Strong Typing

Backward Automatic Differentiation Explained

reversegear

Today I will try to explain how the forward and backward mode in automatic differentiation work. I will only cover the principle, not actual algorithms and the optimizations they apply. While the so called forward mode is quite intuitive, it is not so easy to wrap your head around the backward mode. I will try to go through all steps and not leave out anything seemingly trivial.

We consider the computation of a function f: \mathbb{R}^n \rightarrow \mathbb{R}^m with independent variables x_1, \dots , x_n and dependent variables y_1, \dots , y_m. The ultimate goal is to compute the Jacobian

J = \begin{pmatrix}  \frac{\partial y_1}{\partial x_1} &  & \dots &  & \frac{\partial y_1}{\partial x_n} \\   & \ddots & & & \\  \vdots &  & \frac{\partial y_i}{\partial x_j} &  & \vdots \\   &  &                                   & \ddots & \\  \frac{\partial y_m}{\partial x_1} & & \dots & & \frac{\partial y_m}{\partial x_n}  \end{pmatrix}

We view the function f as a composite of elementary operations

u_k = \Phi_k( \{u_\kappa\}_{(\kappa,k) \in \Lambda})

for k > n where we set u_k = x_k for k=1,\dots,n (i.e. we reserve these indices for the start values of the computation) and u_k = y_k for k=K-m+1, \dots, K (i.e. these are the final results of the computation). The notation should suggest that u_k depends on prior results u_\kappa with (\kappa,k) in some index set \Lambda. Note that if (k,l)\in\Lambda this refers to a direct dependency of u_l on u_k, i.e. if u_k depends on u_j, but u_j does not enter the calculation of u_k directly then (j,l) \notin \Lambda.

As an example consider the function

f(x_1, x_2) = x_1 + x_2^2

for which we would have u_1 = x_1, u_2 = x_2, u_3 = u_2^2, u_4 = y_1 = u_1+u_3. The direct dependencies are (1,4), (2,3) and (3,4), but not (2,4), because x_2=u_2 does not enter the expression for u_4 directly.

We can view the computation chain as a directed graph with vertices u_k and edges (k,l) if (k,l)\in\Lambda. There are no circles allowed in this graph (it is a acyclic graph) and it consists of K vertices.

We write |i,j| for the length of the longest path from u_i to u_j and call that number the distance from i to j. Note that this is not the usual definition of distance normally being the length of the shortest path.

If u_j is not reachable from u_i we set |i,j| = \infty. If u_j is reachable from u_i the distance is finite, since the graph is acyclic.

We can compute a partial derivative \partial u_m / \partial u_k using the chain rule

\frac{\partial u_m}{\partial u_k} = \sum_{l|(l,m)\in\Lambda} \frac{\partial u_m}{\partial u_l} \frac{\partial u_l}{\partial u_k}

This suggest a forward propagation scheme: We start at the initial nodes u_1, ... , u_n. For all nodes u_l with maximum distance 1 from all of these nodes we compute

c_l = \sum_{i=1,\dots,n} \frac{\partial u_l}{\partial u_i} c_{i}

where we can choose c_i for i=1,\dots,n freely at this stage. This assigns the dot product of the gradient of u_l w.r.t. x_1, \dots, x_n and (c_1,\dots,c_n) to the node u_l.

If we choose c_k=1 for one specific k\in\{1,\dots,n\} and zero otherwise, we get the partial derivative of u_l by u_k, but we can compute any other directional derivatives using other vectors (c_1,\dots,c_n). (Remember that the directional derivative is the gradient times the direction w.r.t. which the derivative shall be computed.)

Next we consider nodes with maximum distance 2 from all nodes u_1,\dots,u_n. For such a node u_l

c_l = \sum_{i=1,\dots,n} \frac{\partial u_l}{\partial u_i} c_i = \sum_{i=1,\dots,n} \sum_{k|(k,l)\in\Lambda} \frac{\partial u_l}{\partial u_k} \frac{\partial u_k}{\partial u_i} c_i = \sum_{k|(k,l)\in\Lambda} \frac{\partial u_l}{\partial u_k} c_k

where we can assume that the c_k were computed in the previous step, because their maximum distance to all initial nodes u_1,\dots,u_n muss be less than 2, hence 1.

Also note that if k \in \{1,\dots,n\}, which may be the case, \partial u_l / \partial u_k = 1 if k=l and zero otherwise, so \sum_{i=1,\dots,n} \partial u_k / \partial u_i c_i = c_k trivially. Or seemingly trivial.

The same argument can be iterated for nodes with maximum distance 3, 4, \dots until we reach the final nodes u_{K-m+1},\dots,u_K. This way we can work forward through the computational graph and compute the directional derivative we seek.

In the backward mode we do very similar things, but in a dual way: We start at the final nodes and compute for all nodes u_l with maximum distance 1 from all of these nodes

\overline{c_l} = \sum_{i=K-m+1,\dots,K} \frac{\partial u_i}{\partial u_l} \overline{c_i}

Note that we compute a weighted sum in the dependent variables now. By setting a specific c_k to 1 and the rest to zero again we can compute the partial derivatives of a single final variable. Again using the chain rule we can compute

\overline{c_l} = \sum_{i=K-m+1,\dots,K} \frac{\partial u_i}{\partial u_l} \overline{c_i} = \sum_{i=K-m+1,\dots,K}\sum_{k|(l,k)\in\Lambda} \frac{\partial u_i}{\partial u_k}\frac{\partial u_k}{\partial u_l} \overline{c_i} = \sum_{k|(l,k)\in\Lambda} \frac{\partial u_k}{\partial u_l} \overline{c_k}

for all nodes u_l with maximum distance of 2 from all the final nodes.

Note that the chain rule formally requires to include all variables u_k on which u_i depends. Howvever if u_k does not depend on u_l the whole term will effectively be zero, so we can drop these summands from the beginning. Also we may include indices k on which u_i does not depend in the first place, which is not harmful for the same reason.

As above we can assume all \overline{c_k} to be computed in the previous step, so that we can iterate backwards to the inital nodes to get all partial derivatives of the weighted sum of the final nodes w.r.t. the initial nodes.

Backward Automatic Differentiation Explained