# Supernatural Libor Coupons

Natural Libor coupons in swaps are fixed at (or usually some few days before) the start of their accrual period while the payment is made at the end of that period. Sometimes the fixing is delayed until the end of the period, so that the payment is done only a few days after the fixing. Such coupons are said to be fixed in arrears. What seems to be a slight modification of the terms distinguishes a simple case of model independent reading-the-price-off-the-forward curve from a highly exotic and model dependent endeavor in pricing and hedging.

The exact criterion for being a natural coupon boils down to the equality of the index estimation end date and the payment date. Everything else (accrual start and end date, fixing date) does not matter. Note that all these dates are in general different. In particular the index estimation end date does not need to coincide with the accrual end date (the latter being usually identical to the payment date). This is due to the swap schedule construction where an intermediate accrual start date may be moved a few days forward due to non-business days adjustments while the corresponding accrual end date falls into the regular schedule, which then causes the index estimation end date lying a few days after the accrual end date.

While in principle all cases with index estimation end date not equal to the payment date require a so called timing adjustment for pricing, it is only significant in cases where there is a bigger lag between the two dates.

Either case is relevant: While there are swaps with in arrears fixings, i.e. a payment long before the index end date, or average rate swaps (corresponding to a basket of coupons fixed somewhere between in advance and in arrears) there are also delayed payments occurring in some adjustable rate mortgages loans (I seem to remember that it is perfectly usual for Spanish mortgages, but I am not totally sure about that at the moment …).

Pricing models range from simple adjustment formulas coming from simple Black style models over replication approaches taking into account the whole relevant caplet smile using some terminal rate model (similar to the well known replication models for CMS coupons) up to full term structure model implied adjustments.

Where does QuantLib stand. Currently quite at the baseline, we only have a simple Black76 adjustment for the case of in arrears fixings, which is implemented in BlackIborCouponPricer::adjustedFixing(). Nothing in place for delayed payments or for fixings somewhere between in advance and in arrears.

Some time ago I wrote down the formula which generalizes the simple Black76 in arrears adjustment to the case of an arbitrary configuration of index end and payment date. You can find the paper here. I am not going into the derivation here, it’s better to read that for oneself in a lonely hour. Instead I will visualize an example.

The implementation is QuantLib of the formulas is straight forward, you can see the relevant code here.

The following graph shows the adjustments for an Euribor 6m coupon on evaluation date 24-08-2015 with fixing on 24-08-2025, $2\%$ forward rate and associated caplet volatility of $40\%$ (lognormal, no shift). The payment date slides between 26-08-2025 ($t \approx 10$) and 26-08-2026 ($t \approx 11$) and for each we compute the corresponding timing adjustment. The index estimation period end date is 26-02-2026 ($t \approx 10.5$). Remember that this is also the natural payment time with zero timing adjustment.

For payments before the natural payment time 26-02-2026 the adjustment is positive, for delayed payments negative, ranging between plus/mins three basis points approximately. We can introduce a correlation parameter $\rho$ (see the paper for the model behind). Normally the parameter will be close to one, but for very large delays it may also be sensible to choose a correlation below one. However for consistency is must be ensured that $\rho \rightarrow 1$ if the payment time approaches the natural payment time from the left (again see the paper for the rationale behind). Therefore we set

$\rho(\tau) = \rho_0 + (1-\rho_0) e^{-10\tau}$

with $\tau$ defined to be the year fraction between the payment time and the natural payment time. For payment times after the natural payment time we just assume a constant $\rho = \rho_0$.

# Ultra Strong Typing

Picture a library. QuantLib. Imagine it has an overloaded function

void f(unsigned int a) {}
void f(unsigned int a, unsigned int b) {}


which is used like this in client code

f(0);
f(0,0);


Now you want to extend the function adding an additional parameter. For backward compatibility you give it a default value.

void f(unsigned int a, double t = 0.0) {}
void f(unsigned int a, unsigned int b, double t = 0.0) {}


Compiling the client code above results in a compiler complaint, g++ for example says

testoverload.cpp:6:10: error:
call of overloaded ‘f(int, int)’ is ambiguous
f(0,0);
^
candidate: void f(unsigned int, double)
void f(unsigned int a, double t = 0.0) {}
^
candidate: void f(unsigned int, unsigned int, double)
void f(unsigned int a, unsigned int b, double t = 0.0) {}
^


This is because the arguments 0 are int‘s, so not directly matching any signature exactly. Therefore the compiler tries to convert the arguments to match one of the signatures. However int can both be converted to unsigned int and double, which causes the ambiguity. One can resolve the error by changing the call to

f(0,0.0);


or

f(0u,0u);


or even

f(0.0,0.0)


or

f(0.0,0u)


all of which have one signature strictly better matching than the other, since only requiring conversion of at most one (instead of both) of the arguments while the other is already exactly matching.

Does that help ? No, since the new code should work with existing client code out of the box. That is the whole point of backward compatibility.

What we can do is to apply the SFINAE technique I wrote about in an earlier post. We make the second parameter a template parameter

template<class T>
void f(unsigned int a, T t = T(0.0)) {}

template<class T>
void f(unsigned int a, T b, double t = 0.0) {}


and then use meta programming to switch the function definition on and off depending on the type T

#include <boost/utility/enable_if.hpp>
#include <boost/type_traits/is_float.hpp>
#include <boost/type_traits/is_integral.hpp>

template<class T>
void f(unsigned int a, T t = T(0.0),
typename boost::enable_if<
boost::is_float<T> >::type* = 0 ) {}

template<class T>
void f(unsigned int a, T b, double t = 0.0,
typename boost::enable_if<
boost::is_integral<T> >::type* = 0) {}


Remember that boost::enable_if<b>::type is either void (when b is true, i.e. when T is a float or integral type respectively) or nothing. If it is nothing it causes a potential compiling error.

However since we are in the process of trying template substitutions here, the whole function definition is just silently discarded from the set of possible template incarnations (“substitution failure is not an error”, SFINAE). What is left is the one function definition we actually want depending on T.

The code then safely chooses the upper definition if we pass a float type like double or float as the second parameter and the lower one for an integral type like int, unsigned int, std::size_t and so on.

Quite heavy machinery to solve an easy problem.

# Backward Automatic Differentiation Explained

Today I will try to explain how the forward and backward mode in automatic differentiation work. I will only cover the principle, not actual algorithms and the optimizations they apply. While the so called forward mode is quite intuitive, it is not so easy to wrap your head around the backward mode. I will try to go through all steps and not leave out anything seemingly trivial.

We consider the computation of a function $f: \mathbb{R}^n \rightarrow \mathbb{R}^m$ with independent variables $x_1, \dots , x_n$ and dependent variables $y_1, \dots , y_m$. The ultimate goal is to compute the Jacobian

$J = \begin{pmatrix} \frac{\partial y_1}{\partial x_1} & & \dots & & \frac{\partial y_1}{\partial x_n} \\ & \ddots & & & \\ \vdots & & \frac{\partial y_i}{\partial x_j} & & \vdots \\ & & & \ddots & \\ \frac{\partial y_m}{\partial x_1} & & \dots & & \frac{\partial y_m}{\partial x_n} \end{pmatrix}$

We view the function $f$ as a composite of elementary operations

$u_k = \Phi_k( \{u_\kappa\}_{(\kappa,k) \in \Lambda})$

for $k > n$ where we set $u_k = x_k$ for $k=1,\dots,n$ (i.e. we reserve these indices for the start values of the computation) and $u_k = y_k$ for $k=K-m+1, \dots, K$ (i.e. these are the final results of the computation). The notation should suggest that $u_k$ depends on prior results $u_\kappa$ with $(\kappa,k)$ in some index set $\Lambda$. Note that if $(k,l)\in\Lambda$ this refers to a direct dependency of $u_l$ on $u_k$, i.e. if $u_k$ depends on $u_j$, but $u_j$ does not enter the calculation of $u_k$ directly then $(j,l) \notin \Lambda$.

As an example consider the function

$f(x_1, x_2) = x_1 + x_2^2$

for which we would have $u_1 = x_1$, $u_2 = x_2$, $u_3 = u_2^2$, $u_4 = y_1 = u_1+u_3$. The direct dependencies are $(1,4)$, $(2,3)$ and $(3,4)$, but not $(2,4)$, because $x_2=u_2$ does not enter the expression for $u_4$ directly.

We can view the computation chain as a directed graph with vertices $u_k$ and edges $(k,l)$ if $(k,l)\in\Lambda$. There are no circles allowed in this graph (it is a acyclic graph) and it consists of $K$ vertices.

We write $|i,j|$ for the length of the longest path from $u_i$ to $u_j$ and call that number the distance from $i$ to $j$. Note that this is not the usual definition of distance normally being the length of the shortest path.

If $u_j$ is not reachable from $u_i$ we set $|i,j| = \infty$. If $u_j$ is reachable from $u_i$ the distance is finite, since the graph is acyclic.

We can compute a partial derivative $\partial u_m / \partial u_k$ using the chain rule

$\frac{\partial u_m}{\partial u_k} = \sum_{l|(l,m)\in\Lambda} \frac{\partial u_m}{\partial u_l} \frac{\partial u_l}{\partial u_k}$

This suggest a forward propagation scheme: We start at the initial nodes $u_1, ... , u_n$. For all nodes $u_l$ with maximum distance $1$ from all of these nodes we compute

$c_l = \sum_{i=1,\dots,n} \frac{\partial u_l}{\partial u_i} c_{i}$

where we can choose $c_i$ for $i=1,\dots,n$ freely at this stage. This assigns the dot product of the gradient of $u_l$ w.r.t. $x_1, \dots, x_n$ and $(c_1,\dots,c_n)$ to the node $u_l$.

If we choose $c_k=1$ for one specific $k\in\{1,\dots,n\}$ and zero otherwise, we get the partial derivative of $u_l$ by $u_k$, but we can compute any other directional derivatives using other vectors $(c_1,\dots,c_n)$. (Remember that the directional derivative is the gradient times the direction w.r.t. which the derivative shall be computed.)

Next we consider nodes with maximum distance $2$ from all nodes $u_1,\dots,u_n$. For such a node $u_l$

$c_l = \sum_{i=1,\dots,n} \frac{\partial u_l}{\partial u_i} c_i = \sum_{i=1,\dots,n} \sum_{k|(k,l)\in\Lambda} \frac{\partial u_l}{\partial u_k} \frac{\partial u_k}{\partial u_i} c_i = \sum_{k|(k,l)\in\Lambda} \frac{\partial u_l}{\partial u_k} c_k$

where we can assume that the $c_k$ were computed in the previous step, because their maximum distance to all initial nodes $u_1,\dots,u_n$ muss be less than $2$, hence $1$.

Also note that if $k \in \{1,\dots,n\}$, which may be the case, $\partial u_l / \partial u_k = 1$ if $k=l$ and zero otherwise, so $\sum_{i=1,\dots,n} \partial u_k / \partial u_i c_i = c_k$ trivially. Or seemingly trivial.

The same argument can be iterated for nodes with maximum distance $3, 4, \dots$ until we reach the final nodes $u_{K-m+1},\dots,u_K$. This way we can work forward through the computational graph and compute the directional derivative we seek.

In the backward mode we do very similar things, but in a dual way: We start at the final nodes and compute for all nodes $u_l$ with maximum distance $1$ from all of these nodes

$\overline{c_l} = \sum_{i=K-m+1,\dots,K} \frac{\partial u_i}{\partial u_l} \overline{c_i}$

Note that we compute a weighted sum in the dependent variables now. By setting a specific $c_k$ to $1$ and the rest to zero again we can compute the partial derivatives of a single final variable. Again using the chain rule we can compute

$\overline{c_l} = \sum_{i=K-m+1,\dots,K} \frac{\partial u_i}{\partial u_l} \overline{c_i} = \sum_{i=K-m+1,\dots,K}\sum_{k|(l,k)\in\Lambda} \frac{\partial u_i}{\partial u_k}\frac{\partial u_k}{\partial u_l} \overline{c_i} = \sum_{k|(l,k)\in\Lambda} \frac{\partial u_k}{\partial u_l} \overline{c_k}$

for all nodes $u_l$ with maximum distance of $2$ from all the final nodes.

Note that the chain rule formally requires to include all variables $u_k$ on which $u_i$ depends. Howvever if $u_k$ does not depend on $u_l$ the whole term will effectively be zero, so we can drop these summands from the beginning. Also we may include indices $k$ on which $u_i$ does not depend in the first place, which is not harmful for the same reason.

As above we can assume all $\overline{c_k}$ to be computed in the previous step, so that we can iterate backwards to the inital nodes to get all partial derivatives of the weighted sum of the final nodes w.r.t. the initial nodes.