Supernatural Libor Coupons

Natural Libor coupons in swaps are fixed at (or usually some few days before) the start of their accrual period while the payment is made at the end of that period. Sometimes the fixing is delayed until the end of the period, so that the payment is done only a few days after the fixing. Such coupons are said to be fixed in arrears. What seems to be a slight modification of the terms distinguishes a simple case of model independent reading-the-price-off-the-forward curve from a highly exotic and model dependent endeavor in pricing and hedging.

The exact criterion for being a natural coupon boils down to the equality of the index estimation end date and the payment date. Everything else (accrual start and end date, fixing date) does not matter. Note that all these dates are in general different. In particular the index estimation end date does not need to coincide with the accrual end date (the latter being usually identical to the payment date). This is due to the swap schedule construction where an intermediate accrual start date may be moved a few days forward due to non-business days adjustments while the corresponding accrual end date falls into the regular schedule, which then causes the index estimation end date lying a few days after the accrual end date.

While in principle all cases with index estimation end date not equal to the payment date require a so called timing adjustment for pricing, it is only significant in cases where there is a bigger lag between the two dates.

Either case is relevant: While there are swaps with in arrears fixings, i.e. a payment long before the index end date, or average rate swaps (corresponding to a basket of coupons fixed somewhere between in advance and in arrears) there are also delayed payments occurring in some adjustable rate mortgages loans (I seem to remember that it is perfectly usual for Spanish mortgages, but I am not totally sure about that at the moment …).

Pricing models range from simple adjustment formulas coming from simple Black style models over replication approaches taking into account the whole relevant caplet smile using some terminal rate model (similar to the well known replication models for CMS coupons) up to full term structure model implied adjustments.

Where does QuantLib stand. Currently quite at the baseline, we only have a simple Black76 adjustment for the case of in arrears fixings, which is implemented in BlackIborCouponPricer::adjustedFixing(). Nothing in place for delayed payments or for fixings somewhere between in advance and in arrears.

Some time ago I wrote down the formula which generalizes the simple Black76 in arrears adjustment to the case of an arbitrary configuration of index end and payment date. You can find the paper here. I am not going into the derivation here, it’s better to read that for oneself in a lonely hour. Instead I will visualize an example.

The implementation is QuantLib of the formulas is straight forward, you can see the relevant code here.

The following graph shows the adjustments for an Euribor 6m coupon on evaluation date 24-08-2015 with fixing on 24-08-2025, 2\% forward rate and associated caplet volatility of 40\% (lognormal, no shift). The payment date slides between 26-08-2025 (t \approx 10) and 26-08-2026 (t \approx 11) and for each we compute the corresponding timing adjustment. The index estimation period end date is 26-02-2026 (t \approx 10.5). Remember that this is also the natural payment time with zero timing adjustment.

For payments before the natural payment time 26-02-2026 the adjustment is positive, for delayed payments negative, ranging between plus/mins three basis points approximately. We can introduce a correlation parameter \rho (see the paper for the model behind). Normally the parameter will be close to one, but for very large delays it may also be sensible to choose a correlation below one. However for consistency is must be ensured that \rho \rightarrow 1 if the payment time approaches the natural payment time from the left (again see the paper for the rationale behind). Therefore we set

\rho(\tau) = \rho_0 + (1-\rho_0) e^{-10\tau}

with \tau defined to be the year fraction between the payment time and the natural payment time. For payment times after the natural payment time we just assume a constant \rho  = \rho_0.


Supernatural Libor Coupons

Ultra Strong Typing

Picture a library. QuantLib. Imagine it has an overloaded function

void f(unsigned int a) {}
void f(unsigned int a, unsigned int b) {}

which is used like this in client code


Now you want to extend the function adding an additional parameter. For backward compatibility you give it a default value.

void f(unsigned int a, double t = 0.0) {}
void f(unsigned int a, unsigned int b, double t = 0.0) {}

Compiling the client code above results in a compiler complaint, g++ for example says

testoverload.cpp:6:10: error: 
   call of overloaded ‘f(int, int)’ is ambiguous
testoverload.cpp:1:6: note: 
   candidate: void f(unsigned int, double)
 void f(unsigned int a, double t = 0.0) {}
testoverload.cpp:2:6: note: 
   candidate: void f(unsigned int, unsigned int, double)
 void f(unsigned int a, unsigned int b, double t = 0.0) {}

This is because the arguments 0 are int‘s, so not directly matching any signature exactly. Therefore the compiler tries to convert the arguments to match one of the signatures. However int can both be converted to unsigned int and double, which causes the ambiguity. One can resolve the error by changing the call to




or even




all of which have one signature strictly better matching than the other, since only requiring conversion of at most one (instead of both) of the arguments while the other is already exactly matching.

Does that help ? No, since the new code should work with existing client code out of the box. That is the whole point of backward compatibility.

What we can do is to apply the SFINAE technique I wrote about in an earlier post. We make the second parameter a template parameter

template<class T>
void f(unsigned int a, T t = T(0.0)) {}

template<class T>
void f(unsigned int a, T b, double t = 0.0) {}

and then use meta programming to switch the function definition on and off depending on the type T

#include <boost/utility/enable_if.hpp>
#include <boost/type_traits/is_float.hpp>
#include <boost/type_traits/is_integral.hpp>

template<class T>
void f(unsigned int a, T t = T(0.0), 
 typename boost::enable_if<
                 boost::is_float<T> >::type* = 0 ) {}

template<class T>
void f(unsigned int a, T b, double t = 0.0, 
 typename boost::enable_if<
                 boost::is_integral<T> >::type* = 0) {}

Remember that boost::enable_if<b>::type is either void (when b is true, i.e. when T is a float or integral type respectively) or nothing. If it is nothing it causes a potential compiling error.

However since we are in the process of trying template substitutions here, the whole function definition is just silently discarded from the set of possible template incarnations (“substitution failure is not an error”, SFINAE). What is left is the one function definition we actually want depending on T.

The code then safely chooses the upper definition if we pass a float type like double or float as the second parameter and the lower one for an integral type like int, unsigned int, std::size_t and so on.

Quite heavy machinery to solve an easy problem.

Ultra Strong Typing

Backward Automatic Differentiation Explained


Today I will try to explain how the forward and backward mode in automatic differentiation work. I will only cover the principle, not actual algorithms and the optimizations they apply. While the so called forward mode is quite intuitive, it is not so easy to wrap your head around the backward mode. I will try to go through all steps and not leave out anything seemingly trivial.

We consider the computation of a function f: \mathbb{R}^n \rightarrow \mathbb{R}^m with independent variables x_1, \dots , x_n and dependent variables y_1, \dots , y_m. The ultimate goal is to compute the Jacobian

J = \begin{pmatrix}  \frac{\partial y_1}{\partial x_1} &  & \dots &  & \frac{\partial y_1}{\partial x_n} \\   & \ddots & & & \\  \vdots &  & \frac{\partial y_i}{\partial x_j} &  & \vdots \\   &  &                                   & \ddots & \\  \frac{\partial y_m}{\partial x_1} & & \dots & & \frac{\partial y_m}{\partial x_n}  \end{pmatrix}

We view the function f as a composite of elementary operations

u_k = \Phi_k( \{u_\kappa\}_{(\kappa,k) \in \Lambda})

for k > n where we set u_k = x_k for k=1,\dots,n (i.e. we reserve these indices for the start values of the computation) and u_k = y_k for k=K-m+1, \dots, K (i.e. these are the final results of the computation). The notation should suggest that u_k depends on prior results u_\kappa with (\kappa,k) in some index set \Lambda. Note that if (k,l)\in\Lambda this refers to a direct dependency of u_l on u_k, i.e. if u_k depends on u_j, but u_j does not enter the calculation of u_k directly then (j,l) \notin \Lambda.

As an example consider the function

f(x_1, x_2) = x_1 + x_2^2

for which we would have u_1 = x_1, u_2 = x_2, u_3 = u_2^2, u_4 = y_1 = u_1+u_3. The direct dependencies are (1,4), (2,3) and (3,4), but not (2,4), because x_2=u_2 does not enter the expression for u_4 directly.

We can view the computation chain as a directed graph with vertices u_k and edges (k,l) if (k,l)\in\Lambda. There are no circles allowed in this graph (it is a acyclic graph) and it consists of K vertices.

We write |i,j| for the length of the longest path from u_i to u_j and call that number the distance from i to j. Note that this is not the usual definition of distance normally being the length of the shortest path.

If u_j is not reachable from u_i we set |i,j| = \infty. If u_j is reachable from u_i the distance is finite, since the graph is acyclic.

We can compute a partial derivative \partial u_m / \partial u_k using the chain rule

\frac{\partial u_m}{\partial u_k} = \sum_{l|(l,m)\in\Lambda} \frac{\partial u_m}{\partial u_l} \frac{\partial u_l}{\partial u_k}

This suggest a forward propagation scheme: We start at the initial nodes u_1, ... , u_n. For all nodes u_l with maximum distance 1 from all of these nodes we compute

c_l = \sum_{i=1,\dots,n} \frac{\partial u_l}{\partial u_i} c_{i}

where we can choose c_i for i=1,\dots,n freely at this stage. This assigns the dot product of the gradient of u_l w.r.t. x_1, \dots, x_n and (c_1,\dots,c_n) to the node u_l.

If we choose c_k=1 for one specific k\in\{1,\dots,n\} and zero otherwise, we get the partial derivative of u_l by u_k, but we can compute any other directional derivatives using other vectors (c_1,\dots,c_n). (Remember that the directional derivative is the gradient times the direction w.r.t. which the derivative shall be computed.)

Next we consider nodes with maximum distance 2 from all nodes u_1,\dots,u_n. For such a node u_l

c_l = \sum_{i=1,\dots,n} \frac{\partial u_l}{\partial u_i} c_i = \sum_{i=1,\dots,n} \sum_{k|(k,l)\in\Lambda} \frac{\partial u_l}{\partial u_k} \frac{\partial u_k}{\partial u_i} c_i = \sum_{k|(k,l)\in\Lambda} \frac{\partial u_l}{\partial u_k} c_k

where we can assume that the c_k were computed in the previous step, because their maximum distance to all initial nodes u_1,\dots,u_n muss be less than 2, hence 1.

Also note that if k \in \{1,\dots,n\}, which may be the case, \partial u_l / \partial u_k = 1 if k=l and zero otherwise, so \sum_{i=1,\dots,n} \partial u_k / \partial u_i c_i = c_k trivially. Or seemingly trivial.

The same argument can be iterated for nodes with maximum distance 3, 4, \dots until we reach the final nodes u_{K-m+1},\dots,u_K. This way we can work forward through the computational graph and compute the directional derivative we seek.

In the backward mode we do very similar things, but in a dual way: We start at the final nodes and compute for all nodes u_l with maximum distance 1 from all of these nodes

\overline{c_l} = \sum_{i=K-m+1,\dots,K} \frac{\partial u_i}{\partial u_l} \overline{c_i}

Note that we compute a weighted sum in the dependent variables now. By setting a specific c_k to 1 and the rest to zero again we can compute the partial derivatives of a single final variable. Again using the chain rule we can compute

\overline{c_l} = \sum_{i=K-m+1,\dots,K} \frac{\partial u_i}{\partial u_l} \overline{c_i} = \sum_{i=K-m+1,\dots,K}\sum_{k|(l,k)\in\Lambda} \frac{\partial u_i}{\partial u_k}\frac{\partial u_k}{\partial u_l} \overline{c_i} = \sum_{k|(l,k)\in\Lambda} \frac{\partial u_k}{\partial u_l} \overline{c_k}

for all nodes u_l with maximum distance of 2 from all the final nodes.

Note that the chain rule formally requires to include all variables u_k on which u_i depends. Howvever if u_k does not depend on u_l the whole term will effectively be zero, so we can drop these summands from the beginning. Also we may include indices k on which u_i does not depend in the first place, which is not harmful for the same reason.

As above we can assume all \overline{c_k} to be computed in the previous step, so that we can iterate backwards to the inital nodes to get all partial derivatives of the weighted sum of the final nodes w.r.t. the initial nodes.

Backward Automatic Differentiation Explained

Local Observers

Hello again. Today I am looking at the observable pattern. The pattern is simple at first sight. Its responsibility is to ensure that changes in values are propagated to all dependent objects.

Like when a swap quote ticks the Quote object that holds it should notify a YieldTermStructure that is built using this quote. Which means that the zero rates should be re-bootstrapped using the new market data.

Here another pattern comes into play that is closely coupled with the observable pattern, which is the lazy object. It has an enjoyable name already and it is very useful. A yield term structure is a lazy object actually and this means it won’t re-bootstrap unless someone is really interested in the updated zero rates. So a lot of ticks might happen in the market quotes, but they only mark the dependent structures “dirty”, so that they know they have to recalculate when asked for some result later on (maybe this will never happen, then it will never recalculate). Without the lazy object pattern and frequent changes in the input data on the other hand you are likely to freeze your cpu with unnecessary computations.

In Germany we have this silly rule that when you have a little crack in your windscreen due to a stone flying around on the Autobahn and when this crack is in a defined area of your windscreen (which is almost surely the case, I can tell), you have to replace the whole thing which is around 500 EUR. However you wouldn’t do that until you have to go to the official biyearly routine checks for your car (which are supervised by our dreaded Technischer Überwachungs Verein). So you just collect the cracks, as many as they occur, and do the repair once, just before the check. Lazy object pattern applied, saves you a lot of money.

But today it is about observables, so let’s have a look at how they work. A dependent object usually derives from Observer which has a registerWith() method that can be invoked on the observed objects, which in turn derive from Observable. An observer will use the registerWith() method to listen to all objects which it is interested in.

When an observable changes its state it fires a notification to all observers by invoking their update() method. This notification is triggered by the notifyObservers() method, that can be invoked from an observable instance whenever its creator feels that a change should send a notification to the interested observers. The update() method on the other hand is virtual and implemented in Observer instances to react appropriately to notifications. For example the LazyObject implements the update() method with the mark-dirty logic described above.

This is nice, however can build complex and huge structures during runtime. For example a quote can be observed by a swap rate helper, which is observed by a yield term structure which in turn is observed by another swap rate helper (using the yield term structure as an exogenous discounting curve) which is observed by a second yield term structure. This is observed by an Euribor index using the yield term structure as its estimation curve. The index is observed by an Euribor floating coupon and this again is observed by a swap instrument. This swap instrument is observed by a swaption instrument as its underlying. And so on … what we get is a directed acyclic (hopefully) graph describing the dependencies between the objects.

Sometimes the mechanics of the observable pattern is too coarse. Let’s take a specific example I recently ran into. The SwaptionVolCube1 class represents a SABR swaption cube. It registers with a lot of observables, which I just copy paste here


All except the last statement are done in the base class SwaptionVolatilityCube. The last statement registers with the initial values for the SABR model which are user input.

Now imagine we build a SABR cube with some parameter guesses, say we set them all to \alpha=0.05, \beta=0.8, \nu=0.5, \rho=0.0 and let the class calibrate \alpha, \nu, \rho to the market smiles starting from these values. \beta is held fixed for the moment. The next step could be that we do a calibration to the cms market using the respective helper class’s CmsMarket and CmsMarketCalibration. They adjust the \beta parameters such that cms swap margins are matched.

The result of all this is a calibrated cube which we can use as a market data object for other calculations.

Now suppose the forward curves embedded in the swap indexes swapIndexBase and shortSwapIndexBase change due to a market tick. What happens is that the SwaptionVolCube1 object is market dirty (one little crack, don’t panic – the lazy object trick) and once a volatility is queried from the cube, perfomCalculations() is invoked to update the cube. This update triggers a recalibration of the cube to the new atm level which ultimately means that \alpha,\nu,\rho are recalibrated.

It is reasonable that the calibration of \beta is not part of this update, because it takes rather long. Also it seems ok to do the calibration to the cms market only once a day. Actually it may be good enough to recalibrate \alpha only to the real-time ticking atm volatility quotes and changing atm levels and update the whole set of SABR parameters only once a day using the quoted smiles and the cms market information.

Anyway what happens (or happened before the fix of that issue) during the update of the cube is that the SABR parameters are initialized with the values in the associated parameter guess quotes, because the notification may have arisen from a change in one of these quotes. We just don’t know. I did not write the original logic, but I assume the rationale for letting the initial parameter guess be quotes is that traders might want to feed the \beta directly into the cube, as an alternative to the calibration on the cms market. In this use case it is perfectly reasonable to observe these \beta values and recalibrate the cube when they change and set the \beta‘s to the prescribed values.

However in our use case – when using the cms market calibration – this destroys the calibrated \beta‘s, even when the parameter guess quotes remain unchanged and the notification actually came from quite another context (the yield term structure our scneario here): The \beta‘s would be overwritten wit the initial parameter guesses, which were 0.8.

The problem is that the observability pattern is just asking “did something change ?” and reacting with “I will update everything“. It does not distinguish the sources of notification and does not narrow down the update process to the parts that are required by the specific source of notification.

A workaround for this are local observers, which are separated observer instances stored in the main class as member variables. In our case we can define

        class PrivateObserver : public Observer {
            PrivateObserver(SwaptionVolCube1x<Model> *v)
                : v_(v) {}
            void update() {
            SwaptionVolCube1x<Model> *v_;

       boost::shared_ptr<PrivateObserver> privateObserver_;

The member is initialized with

        privateObserver_ = boost::make_shared<PrivateObserver>(this);

in the cube’s constructor and the registration with the parameter guess quotes take place from this local observer like this

        for (Size i=0; i<4; i++)
            for (Size j=0; j<nOptionTenors_; j++)
                for (Size k=0; k<nSwapTenors_; k++)

This way, the setParameterGuess() method (which reinitializes the parameter guesses) is only called from the local observer in case one of the input guesses actually changes. In addition the main (swaption cube 1) class’s update method is triggered afterwards as well, so that other updates depending on changes of the parameter guesses can take place. After all the swaption cube still observes the parameter guess quotes via this indirection, but the reaction to the change can be made more specific.

What remains to mention is that the call to setParameterGuess() from the local observer breaks the lazy object strategy of doing nothing but marking the object dirty until a request triggers a re-computation. This seems ok in this case, since the setting of the parameter guesses is very lightweight in terms of computational resources.

Under other circumstances one can also replace the local observer by a full local lazy object. However then one must implement the update() method in the local lazy object such that the main class is notified of the change immediately, too (otherwise it would remain in a “green” status despite the change in the – indirect – observable):

void update() {

In addition the performCalculations() method in the main class must ensure a valid status of the local lazy object by doing


If a notification marked the local lazy object dirty, this call will trigger its performCalculations() method. If the local lazy object is still “green” during the main class’s performCalculations(), it will do nothing. Just as it should be.

Local Observers

XVA for Bermudan Swaptions

Welcome back. Sorry for the hiatus which was longer than planned. Anyway and finally here is the (presumably) last episode in the series of posts on proxy pricing of exotics for XVA simulations. In my last posts I wrote a bit on the general design principles I came up with and how I used it to implement the FX TaRF. Today I will describe how to do similar things for bermudan swaptions using the same design principles.

Some general thoughts beforehand though. Reflections on what we are doing here at all. What we have is some model M in which we want to compute some XVA or potential exposure figure \Xi for our exotic deal I. Possibly and probably together with a whole portfolio of other deals which interact in a non linear way, i.e.

\Xi( \sum_i I_i ) \neq \sum_i \Xi( I_i)

To do this we need NPVs \nu of each I_i at some time t conditional on some model state m(t) in our model M:

\nu = \nu(I_i,t,m(t))

M might be operated in a real world measure (e.g. for exposure measurement) or a risk neutral measure (e.g. for CVA calculation), dependent on the nature of the figure to compute. The model state m(t) implies a market data scenario like certain discount and forwarding term structures and swaption or cap volatility surfaces and the like.

Ideally the NPV \nu is computed in the same pricing model we use for our regular PL computation or trading activities, calibrated to the market data scenario created by M in the way we would do it for usual pricing. That means in general we have an “outer” model M and an “inner” model N_i for each instrument I_i. Even if the instruments all allow for the same model for the inner evaluation, they will probably be different in their calibrated parameters:

For example if we have a portfolio composed of single currency swaps, callable swaps and swaptions we could choose the Hull White model to evaluate the callable swaps and swaptions and price the swaps directly on the yield curves. But each callable swap / swaption would in general require a different set of calibration instruments, so we would have a set of local, inconsistent models for each individual swaption valuation. We could try to use a global model in which all instruments can be priced consistently and maybe use this model as the outer model for XVA calculation at the same time, provided that the XVA figure \Xi allows for risk neutral valuation at all.

This could be a Libor market model which is one of the most flexible models in terms of global calibration. The global calibration alone would be a delicate task however. If more than one currency is involved and maybe even other asset classes the task gets even more demanding. And what do you do if you want a real world exposure figure? This approach seems quite heavy, not impossible, but heavy.

So it seems reasonable to stick with the concept of separate outer and inner models in general. The question is then how to efficiently compute the inner single NPVs \nu(I_i,t,m(t)) conditional on market scenarios generated by the outer model. Full pricing is not an option even for relatively simple inner models like Hull White one factor models.

Another idea could be to compute some reference NPVs in time direction t assuming some arbitrary market data scenarios (e.g. simply using today’s market data rolled forward), together with a “rich” set of sensitivities which allow to estimate NPVs under arbitrary market scenarios by Taylor approximation and interpolating in time direction (or use a theta sensitivity). Maybe automatic differentiation comes into play here.

The idea I followed in the previous posts is different, more simple, and goes like this. During an initial single deal pricing performed by a monte carlo simulation we collect path data and build a regression model of conditional NPVs on the model state, pricing time and possibly more ancillary variables (like the accumulated amount in the case of the FX TaRF) just as it is done in the Longstaff Schwartz method for early exercise pricing (or almost like this, see below). Then, when asked for an NPV at a pricing time t under a given market scenario m(t) (and possibly more conditions like the already accumulated amount of a target structure) we imply the model state belonging to the market scenario and use our regression model to produce the desired NPV very efficiently.

The limitation of this method is that the given market data can not be replicated in full beauty in general. In case of the FX TaRF we could replicate any given FX spot rate, but not its volatility which is fixed and implied by the original pricing model for example. In case of bermudan swaptions in the Hull White model we will see in this post that we can replicate one given reference rate, i.e. the general level of the yield curve, but not the shape of the curve, the spread to other yield curves involved and in particular not the volatility structure, which are all fixed again and implied by the model used for initial pricing.

Actually what can be replicated is exactly corresponding to what is explicitly modelled as a stochastic quantity in the model: Since for the TaRF we used a Black model, possibly with local volatility but not with a stochastic factor for the volatility, the volatility is fixed. The FX spot on the other hand as the explicitly modelled quantity can be replicated to be any desired value.

In case of the Hull White model we can imply the level of the yield curve, since this is modelled through the one factor in the model, but we can not match a given shape, since the model does not have the flexibility change the shape of the curve (the shape does change in fact, but in a fixed manner, not driven by stochastic factors in the model). The same with the volatility structure, it is implied by the initial pricing model again.

If one accepts these limitations the method offers a very fast and handy way to approximate the desired scenario NPVs however.

Let’s come to an example how this works for bermudan swaptions. I’d like to explain this going through an example I wrote. We start like this

// Original evaluation date and rate level

Real rateLevelOrig = 0.02;
Date refDateOrig(12, January, 2015);
Settings::instance().evaluationDate() = refDateOrig;

// the yield term structure for the original pricing
// this must _not_ be floating, see the warning in
// the proxy engine's header.

Handle<YieldTermStructure> ytsOrig(boost::make_shared<FlatForward>(
         refDateOrig, rateLevelOrig, Actual365Fixed()));

The original pricing date is 12-Jan-2015 and the yield term structure on this date is at 2\% flat. We set QuantLib’s original pricing date to this date and construct the yield term structure.

One important point is that the yield term structure has a fixed reference date, i.e. it does not change when the evaluation date changes. The reason is that this yield term structure will be linked to the initial pricing model which in turn will be used in the proxy pricing engine later on, which finally relies on the fact that the model does not change when shifting the evaluation date for scenario pricing.

It is like having a fixed frame, as seen from the outer model described above, within which we do local pricings at different times and under different market scenarios, but relying on the frame surronding it to not change. Likewise the rates describing the yield term structure should not change, nor should the initial pricing model be recalibrated or change its parameters during the scenario pricings. The pricing model will be a Hull White model

// the gsr model in T-forward measure, T=50 chosen arbitrary here
// the first model uses the fixed yts, used for the mc pricing
// generating the proxy

boost::shared_ptr<Gsr> gsrFixed = boost::make_shared<Gsr>(
            ytsOrig, stepDates, sigmas, reversion, 50.0);

The monte carlo pricing engine will be set up like this

boost::shared_ptr<PricingEngine> mcEngine =
          .withSteps(1) // the gsr model allows for large steps

We do not need a fine time grid since the stochastic process belonging to the GSR model allows for exact evolution over arbitrary large time intervals. We use 10000 steps to calibrate the Longstaff-Schwartz regression models and also for the final pricing. The seed is 42, why not. The last attribute says that we want not only an usual pricing of the swaption, but also generate a proxy information object which can be used for scenario pricing later on. Since the generation consumes some additional time, it is optional to do it.

Now we can do a good old pricing on our instrument swaption2 (I skipped how this was created).

  Real npvOrigMc = swaption2->NPV();
  Real errorOrigMc = swaption2->errorEstimate();

What I also did not show is that I created a reference integral engine to verify the prices of the mc engine and later on of the proxy engine also. The reference engine relies on a floating yield term structure, both with respect to the reference date and the rate level. The same holds for the proxy engine, so we can move forward in time, change the rate level and compare the pricings in the two engines.

The result of the inital pricing is

Bermudan swaption proxy pricing example
Pricing results on the original reference date (January 12th, 2015):
Integral engine npv = 0.0467257 (timing: 97723mus)
MC       engine npv = 0.0459135 error estimate 0.000809767 (timing: 5.55552e+06mus)

The reference NPV from the integral engine is 467bp, slightly higher than the monte carlo price 459bp. This is perfectly expected since the regression model based approximate exercise decisions are sub-optimal, so the option NPV will be underestimated systematically. Together will the error estimate (one standard deviation) of 8bp this all looks quite ok.

One word about the computation times. The monte carlo simulation (10000 paths both for calibration and pricing) takes 5.5 seconds. This is somewhat in the expected region and I am completely relying on the standard monte carlo framework of the library here. The integral engine on the other hand takes 97 milli seconds, which is quite slow. This can easily be brought down to 5 milliseconds just by using less integration points and covered standard deviations (here I used 129 points and 7 standard deviations), without loosing too much accuracy. This seems quite ok, since neither the GSR model nor the integral engine are much optimized for speed currently.

To create the proxy engine we can say

  boost::shared_ptr<PricingEngine> proxyEngine =
              swaption2->proxy(), rateLevelRef, maturityRef, 64, 7.0, true);

where we just pass the proxy result from the pricing above together with quotes representing the rate level of the scenario pricing and the maturity to which the rate level belongs. Remember that we can only prescribe the level, but not the shape of the yield curve in the scenario pricing. So I allowed to chose a maturity, e.g. 10 years, and prescribe the (continuously compounded) zero yield w.r.t. this maturity. The proxy engine will then imply the Hull White’s model state such that this rate level is matched for the given maturity. The shape of the yield curve is implied by the initial model though and can not be changed. I am repeating myself, ain’t I ? Do I ? Repeat ? God.

The next two parameters refer to the way we do the scnenario pricing between two of the original structure’s exexercise dates: We determine the next available exercise date (if any) and build a grid on the exercise date covering n standard deviations (here 5.0) of the state variable around its mean, all this conditional on the pricing time and the model state (which is implied by the rate level as explained above). We then compute the NPV on each of these grid points using the regression model from the initial monte carlo pricing. Finally we interpolate between the points using cubic splines and integrate the resulting function against the state variable’s density (which is possibly in closed form by the way).

Let’s see what we get under some scenarios for the evaluation date and rates. One scenario is simply created by e.g. saying

 Settings::instance().evaluationDate() = Date(12, June, 2015);

which moves the evaluation date half a year ahead to 12-Jun-2015 and requires the 10.5y zero rate (which corresponds to the maturity of the swaption as the reference point) to be 2%. For the scenarios in the example we get

Pricing results on June 12th, 2015, reference rate 0.02 with maturity 10.5
Integral engine npv = 0.0444243
Proxy    engine npv = 0.0430997 (timing: 127mus)

Pricing results on June 11th, 2019, reference rate 0.025 with maturity 6.5
Integral engine npv = 0.0372685
Proxy    engine npv = 0.0361514 (timing: 170mus)

Pricing results on June 11th, 2019, reference rate 0.02 with maturity 6.5
Integral engine npv = 0.0223442
Proxy    engine npv = 0.0223565 (timing: 159mus)

Pricing results on June 11th, 2019, reference rate 0.015 with maturity 6.5
Integral engine npv = 0.0128876
Proxy    engine npv = 0.0141377 (timing: 169mus)

Pricing results on January 11th, 2020, reference rate 0.02 with maturity 6
Integral engine npv = 0.0193142
Proxy    engine npv = 0.0194446 (timing: 201mus)

Pricing results on January 11th, 2021, reference rate 0.02 with maturity 5
Integral engine npv = 0.0145542
Proxy    engine npv = 0.0137726 (timing: 208mus)

Pricing results on June 11th, 2024, reference rate 0.02 with maturity 1.5
Integral engine npv = 0.00222622
Proxy    engine npv = 0.00292998 (timing: 282mus)

which looks not too bad. The proxy engine’s computation time is also not bad, around 100-300 microseconds. Again, this can be made faster by using less integration points, but already seems competitive enough for XVA simulations.

The proxy regression model deserves some comments. It is not just the Longstaff-Schwartz regression model, since this only looks at states where the exercise value is positive. This is essential for a good fit of the regression function (which I chose to be simply a quadratic function), since left from the exercise point the option value (which equals the continuation value in this area) flattens and would destroy the global fit.

What I did until now is just calibrate two separate models, one in the region corresponding to positive exercise values and another in the complementary region. This is similar to the regression model for the FX TaRF, where I also used two quadratic polynomials, if you remember (if not, you can read about it here or more detailled in the paper).

This solution is not perfect and can be improved probably. The following picture shows the regression models from the example. The different exercise dates are distinguished by color (0 denotes the first exercise date, in dark color, 1 the second, and so on until 9 which denotes the 10th exercise date, in bright color).


What you can see is the discontinuity at the cutoff point between exercise and non-exercise area. It is exactly this jump that one should work on I guess. But not now.

In technical terms I relied on Klaus’ implementation of the Longstaff Schwartz algorithm in longstaffschwarzpathpricer.hpp. This needed some amendments to allow for non-american call rights which it was designed for I believe. Apart of this I mainly inserted a hook in the pricer

virtual void post_processing(const Size i,
                             const std::vector<StateType> &state,
                             const std::vector<Real> &price,
                             const std::vector<Real> &exercise) {}

with an empty default implementation. This function is called during the calibration phase, passing the collected data to a potential consumer, which can do useful things with it like building a regression model for proxy pricing like in our use case here. To do this I derived from Klaus’ class as follows

class LongstaffSchwartzProxyPathPricer
    : public LongstaffSchwartzPathPricer<typename SingleVariate<>::path_type> {
        const TimeGrid &times,
        const boost::shared_ptr<EarlyExercisePathPricer<PathType> > &pricer,
        const boost::shared_ptr<YieldTermStructure> &termStructure);

    const std::vector<boost::function1<Real, StateType> > basisSystem() const {
        return v_;
    const std::vector<Array> coefficientsItm() const { return coeffItm_; }
    const std::vector<Array> coefficientsOtm() const { return coeffOtm_; }
    const StateType cutoff() const { return cutoff_; }

which offers inspectors for the two regression models coefficients, the cutoff point separating them and the underlying function system. The implementation of the hook function is simple

void LongstaffSchwartzProxyPathPricer::post_processing(
    const Size i, const std::vector<StateType> &state,
    const std::vector<Real> &price, const std::vector<Real> &exercise) {

    std::vector<StateType> x_itm, x_otm;
    std::vector<Real> y_itm, y_otm;

    cutoff_ = -QL_MAX_REAL;

    for (Size j = 0; j < state.size(); ++j) {
        if (exercise[j] > 0.0) {
        } else {
                cutoff_ = state[j];

    if (v_.size() <= x_itm.size()) {
        coeffItm_[i - 1] =
            GeneralLinearLeastSquares(x_itm, y_itm, v_).coefficients();
    } else {
        // see longstaffschwartzpricer.hpp
        coeffItm_[i - 1] = Array(v_.size(), 0.0);

    if (v_.size() <= x_otm.size()) {
        coeffOtm_[i - 1] =
            GeneralLinearLeastSquares(x_otm, y_otm, v_).coefficients();
    } else {
        // see longstaffschwartzpricer.hpp
        coeffOtm_[i - 1] = Array(v_.size(), 0.0);

and does nothing more than setting up the two data sets on which we estimate the regression models.

The coefficients are read in the monte carlo engine and serve as the essential part in the proxy object which can then be passed to the proxy engine. And we are done. You can find the full example code here.

XVA for Bermudan Swaptions

Pricing Engine Design for exotic XVA simulations (using the example of FX TARFs)

In my previous post I wrote about some ideas to efficiently approximate the value of a fx exotic (a fx tarf in fact). One main motivation is to use such a fast pricing in a XVA simulation.


This post is dedicated to the design I came up with to fit the idea as accurately as possible into the existing QuantLib architecture.

The next post will then present details about the approximation scheme itself, some numerical examples comparing the approximation with a full pricing under several market and time decay scenarios, and performance tests.

Good design is of utmost importance. Even in a prosperous neighbourhood a broken window, not swiftly repaired, will soon lead to a second and after a while a degenerate area. Luckily Luigi would not allow this to happen in QuantLib city of course.

These are my thoughts on the design in general:

  • we should have two pricing engines, one MC and one Proxy engine. We should not have only one engine with some additional methods to extract the approximate NPVs in a proprietary way
  • the proxy engine should behave just like any other engine, e.g. the time decay should consistently be deduced from the global evaluation date, not through some special parameter, and the relevant market data should be given by standard structures
  • the implementation of new instruments and pricing engines following the same idea should be easy and be based on a common interface
  • the end user interface used in client code should be easy to use and foolproof
  • XVA simulation is an application of the proxy engine, but there is no strict connection to this use case – you can also use the proxy engine “just” to compute npvs faster if high accuracy is not required

Or in short: There should be nothing special about the whole thing. Curb your enthusiasm, just do a plain solid job, don’t try to impress anyone. Okay. Let’s start with the instrument class.

FxTarf(const Schedule schedule, 
       const boost::shared_ptr<FxIndex> &index,
       const Real sourceNominal,
       const boost::shared_ptr<StrikedTypePayoff> &shortPositionPayoff,
       const boost::shared_ptr<StrikedTypePayoff> &longPositionPayoff,
       const Real target, 
       const CouponType couponType = capped,
       const Real shortPositionGearing = 1.0,
       const Real longPositionGearing = 1.0,
       const Handle<Quote> accumulatedAmount = Handle<Quote>(),
       const Handle<Quote> lastAmount = Handle<Quote>());

The constructor takes

  • the structure’s schedule (meaning the value dates),
  • the underlying fx index representing for example an ECB fx fixing (this is a new class too, because there doesn’t seem to be a fx index in QuantLib yet, but I do not go into details about that here),
  • the nominal of the structure (in foreign, asset or source currency, there are many names for that)
  • the counterparties’ payoff profiles, which are normally short puts and long calls, all sharing the same strike
  • the target level where the structure knocks out
  • the coupon type (see the previous post I linked above)
  • the gearing of the two sides of the deal

The last two parameters accumulatedAmount and lastAmount are optional. If not given, the FxTarf computes the already accumulated amount reading the index’s historic fixings.

If specified on the other hand, historic fixings are ignored and the given accumulated amount is used. The lastAmount in this context is needed only in case the last fixing already occured, but the associated payment is still in the future. The reason to introduce these somewhat redundant parameters is as follows. On one hand it may be handy not to set all historic fixings for the fx index, but directly set the accumulated amount. Maybe you get the deal information from some source system and it already provides the current accumulated amount along with the deal data. More importantly, during a XVA simulation you might not want to set all fixings in the IndexManager. You can do that, if you like, but it may not be handy, because after each path you have to erase all fixings again or maybe you do not simulate each fixing date even and just want to interpolate the accumulated fixings. In any case it is just a convenience parameter. Use it or just ignore it.

For a full pricing of the tarf we can use a monte carlo engine which is constructed in the usual way, for example

boost::shared_ptr<PricingEngine> mcEngine 
    = MakeMcFxTarfEngine<>(gkProcess)

The parameters here are the (generalized) Black Scholes process, the time steps to simulate per year, the seed for the RNG.

The last modifier .withProxy() is the only special thing here: By default the engine will just compute an npv (and error estimate) as any other mc engine does. If the engine is set up with the proxy flag on the other hand, additional information during the simulation is collected and analyzed to produce some proxy information object that can be used later for approximate pricings. We will see in a minute how.

We need this modifier because the simulation gets slower when creating the proxy, so we should be able to switch it off.

Now we set the engine and can compute the (full) npv:

std::cout << "MCEngine NPV = " << tarf.NPV() << " error estimate = "
          << tarf.errorEstimate() << std::endl;

If we just want the proxy pricing, we can of course skip the full npv calculation, but it doesn’t hurt since the npv is always produced, even if only calculating the information for the proxy engine.

To use the proxy engine we have to start with an engine which can produce this piece of information. The proxy engine is then feeded with the proxy object:

boost::shared_ptr<PricingEngine> proxyEngine =
std::cout << "ProxyEngine NPV = " << tarf.NPV() << std::endl;

The proxy engine is constructed with

  • the proxy description produced by the mc engine, which can be retrieved from the instrument by .proxy() (this is a special result, which seems important enough not to bury it in the additional results heap — this is nothing innovative though, it’s like the bps or fair rate for swaps as an example)
  • the fx quote used for the pricing – i.e. the essential market data needed for the proxy pricing
  • the discount curve for the pricing (which is taken as the domestic rate curve from our black scholes process in our example client code)

In addtion the global evaluation date will determine the reference date for the valuation, just as usual.

If the instrument does not provide a proxy object (e.g. because the mc engine was told so, see above), or if the proxy object is not suitable for the proxy engine to be constructed, an exception will be thrown.

What is going on behind the scenes: I added an interface for instruments that are capable of proxy pricing:

class ProxyInstrument {
    //! Base class proxy descriptions for approximate pricing engines
    struct ProxyDescription {
        // check if proxy description is valid
        virtual void validate() const = 0;

    virtual boost::shared_ptr<ProxyDescription> proxy() const = 0;

The only method to implement is proxy which should return an object containing the information necessary for a compatible proxy engine to compute approximate npvs (see below for what compatible means). The information itself should be an object derived from ProxyInstrument::ProxyDescription. It has to provide a validate method that should check the provided data for consistency.

How is the fx tarf instrument implemented w.r.t. this interface:

class FxTarf : public Instrument, public ProxyInstrument {
    //! proxy description
    struct Proxy : ProxyDescription {
        struct ProxyFunction {
            virtual Real operator()(const Real spot) const = 0;
        // maximum number of open fixings
        Size maxNumberOpenFixings;
        // last payment date, the npvs are forward npvs w.r.t. this date
        Date lastPaymentDate;
        // buckets for accumulated amonut, e.g.
        // 0.0, 0.1, 0.2, 0.3, 0.4 means
        // [0.0,0.1) has index 0
        // [0.1,0.2) has index 1
        // ...
        // [0.4,target] has index 4
        std::vector<Real> accBucketLimits;
        // proxy functions
        // first index is openFixings-1
        // second index is accAmountIndex
        // A function F should implement
        // operator()(Real spot) = npv
        std::vector<std::vector<boost::shared_ptr<ProxyFunction> > > functions;
        void validate() const {
            QL_REQUIRE(functions.size() == maxNumberOpenFixings,
                       "maximum number of open fixings ("
                           << maxNumberOpenFixings
                           << ") must be equal to function rows ("
                           << functions.size() << ")");
            for (Size i = 0; i < functions.size(); ++i) {
                QL_REQUIRE(functions[i].size() == accBucketLimits.size(),
                           "number of acc amount buckets ("
                               << accBucketLimits.size()
                               << ") must be equal to function columns ("
                               << functions[i].size() << ") in row " << i);
/* ... */

This says that the specific (or one possible) proxy information of a fx tarf consists of some descriptive data, which is

  • the maximum number of open (future) fixings
  • the last payment date of the structure – see below
  • a bucketing of the accumulated amount

which together define a segmentation for the approximating function. On each segment the approximating function is then given by a Proxy::ProxyFunction which is at this level of abstraction just an arbitrary function Real to Real, returning the npv for a given fx spot. The npv is expressed on forward basis as of the last payment date of the structure, so that the proxy engine can discount back from this latest possible pricing date to the evaluation date. Remember that this is in general (and typically) different from the one used for the mc pricing.

The validate method checks if the function matrix is filled consistently with the segmentation information.

The proxy object is part of the results class for the instrument, which is again just using the standard formalism:

class FxTarf::results : public Instrument::results {
    void reset();
    boost::shared_ptr<FxTarf::Proxy> proxy;

The proxy engine expects a proxy object, which is checked to be one that is useful for engine is in the constructor, using a dynamic downcast.

    boost::shared_ptr<ProxyInstrument::ProxyDescription> proxy,
    Handle<Quote> exchangeRate, Handle<YieldTermStructure> discount)
    : FxTarfEngine(discount), exchangeRate_(exchangeRate) {
    proxy_ = boost::dynamic_pointer_cast<FxTarf::Proxy>(proxy);

    QL_REQUIRE(proxy_, "no FxTarf::Proxy given");


The third level of specialization is in the monte carlo engine, where the specific proxy object’s function type is implemented, derived from the definitions in the FxTarf class:

template <class RNG = PseudoRandom, class S = Statistics>
class McFxTarfEngine : public FxTarfEngine,
                       public McSimulation<SingleVariate, RNG, S> {
    /*! proxy function giving a function spot => npv for one segment
        (bucket accumulated amount, number of open fixings)
        the function is given by two quadratic polynomials on intervals
        (-\infty,cutoff] and (cutoff,\infty).
        Only the ascending (long calls) or descending (long puts) branch
        is used and then extrapolated flat.
    class QuadraticProxyFunction : public FxTarf::Proxy::ProxyFunction {
        QuadraticProxyFunction(Option::Type, const Real cutoff, const Real a1,
                               const Real b1, const Real c1, const Real a2,
                               const Real b2, const Real c2);
        Real operator()(const Real spot) const;

        Option::Type type_;
        const Real a1_, b1_, c1_, a2_, b2_, c2_;
        const Real cutoff_;
        int flatExtrapolationType1_,
            flatExtrapolationType2_; // +1 = right, -1 = left
        Real extrapolationPoint1_, extrapolationPoint2_;

At this place we finally fix the specific form of the approximating functions, which are in essence given by two quadratic polynomials. I will give more details on and a motiviation for this in the next post.

Finally it appeared useful to derive the mc and proxy engines from a common base engine, which handles some trivial boundary cases (like all fixings are done or the determination of the npv of a fixed amount that is not yet settled), so we have the hierachy

class FxTarfEngine : public FxTarf::engine {}

template <class RNG = PseudoRandom, class S = Statistics>
class McFxTarfEngine : public FxTarfEngine,
                       public McSimulation<SingleVariate, RNG, S> {}

class ProxyFxTarfEngine : public FxTarfEngine {}

If you are interested in the code you can look into my repository. It may already work, but I did not test everything yet. More on this next week.

Pricing Engine Design for exotic XVA simulations (using the example of FX TARFs)