# XVA for Bermudan Swaptions

Welcome back. Sorry for the hiatus which was longer than planned. Anyway and finally here is the (presumably) last episode in the series of posts on proxy pricing of exotics for XVA simulations. In my last posts I wrote a bit on the general design principles I came up with and how I used it to implement the FX TaRF. Today I will describe how to do similar things for bermudan swaptions using the same design principles.

Some general thoughts beforehand though. Reflections on what we are doing here at all. What we have is some model $M$ in which we want to compute some XVA or potential exposure figure $\Xi$ for our exotic deal $I$. Possibly and probably together with a whole portfolio of other deals which interact in a non linear way, i.e.

$\Xi( \sum_i I_i ) \neq \sum_i \Xi( I_i)$

To do this we need NPVs $\nu$ of each $I_i$ at some time $t$ conditional on some model state $m(t)$ in our model $M$:

$\nu = \nu(I_i,t,m(t))$

$M$ might be operated in a real world measure (e.g. for exposure measurement) or a risk neutral measure (e.g. for CVA calculation), dependent on the nature of the figure to compute. The model state $m(t)$ implies a market data scenario like certain discount and forwarding term structures and swaption or cap volatility surfaces and the like.

Ideally the NPV $\nu$ is computed in the same pricing model we use for our regular PL computation or trading activities, calibrated to the market data scenario created by $M$ in the way we would do it for usual pricing. That means in general we have an “outer” model $M$ and an “inner” model $N_i$ for each instrument $I_i$. Even if the instruments all allow for the same model for the inner evaluation, they will probably be different in their calibrated parameters:

For example if we have a portfolio composed of single currency swaps, callable swaps and swaptions we could choose the Hull White model to evaluate the callable swaps and swaptions and price the swaps directly on the yield curves. But each callable swap / swaption would in general require a different set of calibration instruments, so we would have a set of local, inconsistent models for each individual swaption valuation. We could try to use a global model in which all instruments can be priced consistently and maybe use this model as the outer model for XVA calculation at the same time, provided that the XVA figure $\Xi$ allows for risk neutral valuation at all.

This could be a Libor market model which is one of the most flexible models in terms of global calibration. The global calibration alone would be a delicate task however. If more than one currency is involved and maybe even other asset classes the task gets even more demanding. And what do you do if you want a real world exposure figure? This approach seems quite heavy, not impossible, but heavy.

So it seems reasonable to stick with the concept of separate outer and inner models in general. The question is then how to efficiently compute the inner single NPVs $\nu(I_i,t,m(t))$ conditional on market scenarios generated by the outer model. Full pricing is not an option even for relatively simple inner models like Hull White one factor models.

Another idea could be to compute some reference NPVs in time direction $t$ assuming some arbitrary market data scenarios (e.g. simply using today’s market data rolled forward), together with a “rich” set of sensitivities which allow to estimate NPVs under arbitrary market scenarios by Taylor approximation and interpolating in time direction (or use a theta sensitivity). Maybe automatic differentiation comes into play here.

The idea I followed in the previous posts is different, more simple, and goes like this. During an initial single deal pricing performed by a monte carlo simulation we collect path data and build a regression model of conditional NPVs on the model state, pricing time and possibly more ancillary variables (like the accumulated amount in the case of the FX TaRF) just as it is done in the Longstaff Schwartz method for early exercise pricing (or almost like this, see below). Then, when asked for an NPV at a pricing time $t$ under a given market scenario $m(t)$ (and possibly more conditions like the already accumulated amount of a target structure) we imply the model state belonging to the market scenario and use our regression model to produce the desired NPV very efficiently.

The limitation of this method is that the given market data can not be replicated in full beauty in general. In case of the FX TaRF we could replicate any given FX spot rate, but not its volatility which is fixed and implied by the original pricing model for example. In case of bermudan swaptions in the Hull White model we will see in this post that we can replicate one given reference rate, i.e. the general level of the yield curve, but not the shape of the curve, the spread to other yield curves involved and in particular not the volatility structure, which are all fixed again and implied by the model used for initial pricing.

Actually what can be replicated is exactly corresponding to what is explicitly modelled as a stochastic quantity in the model: Since for the TaRF we used a Black model, possibly with local volatility but not with a stochastic factor for the volatility, the volatility is fixed. The FX spot on the other hand as the explicitly modelled quantity can be replicated to be any desired value.

In case of the Hull White model we can imply the level of the yield curve, since this is modelled through the one factor in the model, but we can not match a given shape, since the model does not have the flexibility change the shape of the curve (the shape does change in fact, but in a fixed manner, not driven by stochastic factors in the model). The same with the volatility structure, it is implied by the initial pricing model again.

If one accepts these limitations the method offers a very fast and handy way to approximate the desired scenario NPVs however.

Let’s come to an example how this works for bermudan swaptions. I’d like to explain this going through an example I wrote. We start like this

// Original evaluation date and rate level

Real rateLevelOrig = 0.02;
Date refDateOrig(12, January, 2015);
Settings::instance().evaluationDate() = refDateOrig;

// the yield term structure for the original pricing
// this must _not_ be floating, see the warning in

Handle<YieldTermStructure> ytsOrig(boost::make_shared<FlatForward>(
refDateOrig, rateLevelOrig, Actual365Fixed()));


The original pricing date is 12-Jan-2015 and the yield term structure on this date is at $2\%$ flat. We set QuantLib’s original pricing date to this date and construct the yield term structure.

One important point is that the yield term structure has a fixed reference date, i.e. it does not change when the evaluation date changes. The reason is that this yield term structure will be linked to the initial pricing model which in turn will be used in the proxy pricing engine later on, which finally relies on the fact that the model does not change when shifting the evaluation date for scenario pricing.

It is like having a fixed frame, as seen from the outer model described above, within which we do local pricings at different times and under different market scenarios, but relying on the frame surronding it to not change. Likewise the rates describing the yield term structure should not change, nor should the initial pricing model be recalibrated or change its parameters during the scenario pricings. The pricing model will be a Hull White model

// the gsr model in T-forward measure, T=50 chosen arbitrary here
// the first model uses the fixed yts, used for the mc pricing
// generating the proxy

boost::shared_ptr<Gsr> gsrFixed = boost::make_shared<Gsr>(
ytsOrig, stepDates, sigmas, reversion, 50.0);


The monte carlo pricing engine will be set up like this

boost::shared_ptr<PricingEngine> mcEngine =
MakeMcGaussian1dNonstandardSwaptionEngine<>(gsrFixed)
.withSteps(1) // the gsr model allows for large steps
.withSamples(10000)
.withSeed(42)
.withCalibrationSamples(10000)
.withProxy(true);


We do not need a fine time grid since the stochastic process belonging to the GSR model allows for exact evolution over arbitrary large time intervals. We use $10000$ steps to calibrate the Longstaff-Schwartz regression models and also for the final pricing. The seed is $42$, why not. The last attribute says that we want not only an usual pricing of the swaption, but also generate a proxy information object which can be used for scenario pricing later on. Since the generation consumes some additional time, it is optional to do it.

Now we can do a good old pricing on our instrument swaption2 (I skipped how this was created).

  swaption2->setPricingEngine(mcEngine);
Real npvOrigMc = swaption2->NPV();
Real errorOrigMc = swaption2->errorEstimate();


What I also did not show is that I created a reference integral engine to verify the prices of the mc engine and later on of the proxy engine also. The reference engine relies on a floating yield term structure, both with respect to the reference date and the rate level. The same holds for the proxy engine, so we can move forward in time, change the rate level and compare the pricings in the two engines.

The result of the inital pricing is

Bermudan swaption proxy pricing example
Pricing results on the original reference date (January 12th, 2015):
Integral engine npv = 0.0467257 (timing: 97723mus)
MC       engine npv = 0.0459135 error estimate 0.000809767 (timing: 5.55552e+06mus)


The reference NPV from the integral engine is 467bp, slightly higher than the monte carlo price 459bp. This is perfectly expected since the regression model based approximate exercise decisions are sub-optimal, so the option NPV will be underestimated systematically. Together will the error estimate (one standard deviation) of 8bp this all looks quite ok.

One word about the computation times. The monte carlo simulation (10000 paths both for calibration and pricing) takes 5.5 seconds. This is somewhat in the expected region and I am completely relying on the standard monte carlo framework of the library here. The integral engine on the other hand takes 97 milli seconds, which is quite slow. This can easily be brought down to 5 milliseconds just by using less integration points and covered standard deviations (here I used 129 points and 7 standard deviations), without loosing too much accuracy. This seems quite ok, since neither the GSR model nor the integral engine are much optimized for speed currently.

To create the proxy engine we can say

  boost::shared_ptr<PricingEngine> proxyEngine =
boost::make_shared<ProxyNonstandardSwaptionEngine>(
swaption2->proxy(), rateLevelRef, maturityRef, 64, 7.0, true);


where we just pass the proxy result from the pricing above together with quotes representing the rate level of the scenario pricing and the maturity to which the rate level belongs. Remember that we can only prescribe the level, but not the shape of the yield curve in the scenario pricing. So I allowed to chose a maturity, e.g. 10 years, and prescribe the (continuously compounded) zero yield w.r.t. this maturity. The proxy engine will then imply the Hull White’s model state such that this rate level is matched for the given maturity. The shape of the yield curve is implied by the initial model though and can not be changed. I am repeating myself, ain’t I ? Do I ? Repeat ? God.

The next two parameters refer to the way we do the scnenario pricing between two of the original structure’s exexercise dates: We determine the next available exercise date (if any) and build a grid on the exercise date covering $n$ standard deviations (here $5.0$) of the state variable around its mean, all this conditional on the pricing time and the model state (which is implied by the rate level as explained above). We then compute the NPV on each of these grid points using the regression model from the initial monte carlo pricing. Finally we interpolate between the points using cubic splines and integrate the resulting function against the state variable’s density (which is possibly in closed form by the way).

Let’s see what we get under some scenarios for the evaluation date and rates. One scenario is simply created by e.g. saying

 Settings::instance().evaluationDate() = Date(12, June, 2015);
rateLevelRefQuote->setValue(0.02);
maturityRefQuote->setValue(10.5);


which moves the evaluation date half a year ahead to 12-Jun-2015 and requires the 10.5y zero rate (which corresponds to the maturity of the swaption as the reference point) to be 2%. For the scenarios in the example we get

Pricing results on June 12th, 2015, reference rate 0.02 with maturity 10.5
Integral engine npv = 0.0444243
Proxy    engine npv = 0.0430997 (timing: 127mus)

Pricing results on June 11th, 2019, reference rate 0.025 with maturity 6.5
Integral engine npv = 0.0372685
Proxy    engine npv = 0.0361514 (timing: 170mus)

Pricing results on June 11th, 2019, reference rate 0.02 with maturity 6.5
Integral engine npv = 0.0223442
Proxy    engine npv = 0.0223565 (timing: 159mus)

Pricing results on June 11th, 2019, reference rate 0.015 with maturity 6.5
Integral engine npv = 0.0128876
Proxy    engine npv = 0.0141377 (timing: 169mus)

Pricing results on January 11th, 2020, reference rate 0.02 with maturity 6
Integral engine npv = 0.0193142
Proxy    engine npv = 0.0194446 (timing: 201mus)

Pricing results on January 11th, 2021, reference rate 0.02 with maturity 5
Integral engine npv = 0.0145542
Proxy    engine npv = 0.0137726 (timing: 208mus)

Pricing results on June 11th, 2024, reference rate 0.02 with maturity 1.5
Integral engine npv = 0.00222622
Proxy    engine npv = 0.00292998 (timing: 282mus)


which looks not too bad. The proxy engine’s computation time is also not bad, around 100-300 microseconds. Again, this can be made faster by using less integration points, but already seems competitive enough for XVA simulations.

The proxy regression model deserves some comments. It is not just the Longstaff-Schwartz regression model, since this only looks at states where the exercise value is positive. This is essential for a good fit of the regression function (which I chose to be simply a quadratic function), since left from the exercise point the option value (which equals the continuation value in this area) flattens and would destroy the global fit.

What I did until now is just calibrate two separate models, one in the region corresponding to positive exercise values and another in the complementary region. This is similar to the regression model for the FX TaRF, where I also used two quadratic polynomials, if you remember (if not, you can read about it here or more detailled in the paper).

This solution is not perfect and can be improved probably. The following picture shows the regression models from the example. The different exercise dates are distinguished by color (0 denotes the first exercise date, in dark color, 1 the second, and so on until 9 which denotes the 10th exercise date, in bright color).

What you can see is the discontinuity at the cutoff point between exercise and non-exercise area. It is exactly this jump that one should work on I guess. But not now.

In technical terms I relied on Klaus’ implementation of the Longstaff Schwartz algorithm in longstaffschwarzpathpricer.hpp. This needed some amendments to allow for non-american call rights which it was designed for I believe. Apart of this I mainly inserted a hook in the pricer

virtual void post_processing(const Size i,
const std::vector<StateType> &state,
const std::vector<Real> &price,
const std::vector<Real> &exercise) {}


with an empty default implementation. This function is called during the calibration phase, passing the collected data to a potential consumer, which can do useful things with it like building a regression model for proxy pricing like in our use case here. To do this I derived from Klaus’ class as follows

class LongstaffSchwartzProxyPathPricer
: public LongstaffSchwartzPathPricer<typename SingleVariate<>::path_type> {
public:
LongstaffSchwartzProxyPathPricer(
const TimeGrid &times,
const boost::shared_ptr<EarlyExercisePathPricer<PathType> > &pricer,
const boost::shared_ptr<YieldTermStructure> &termStructure);

const std::vector<boost::function1<Real, StateType> > basisSystem() const {
return v_;
}
const std::vector<Array> coefficientsItm() const { return coeffItm_; }
const std::vector<Array> coefficientsOtm() const { return coeffOtm_; }
const StateType cutoff() const { return cutoff_; }


which offers inspectors for the two regression models coefficients, the cutoff point separating them and the underlying function system. The implementation of the hook function is simple

void LongstaffSchwartzProxyPathPricer::post_processing(
const Size i, const std::vector<StateType> &state,
const std::vector<Real> &price, const std::vector<Real> &exercise) {

std::vector<StateType> x_itm, x_otm;
std::vector<Real> y_itm, y_otm;

cutoff_ = -QL_MAX_REAL;

for (Size j = 0; j < state.size(); ++j) {
if (exercise[j] > 0.0) {
x_itm.push_back(state[j]);
y_itm.push_back(price[j]);
} else {
x_otm.push_back(state[j]);
y_otm.push_back(price[j]);
if(state[j]>cutoff_)
cutoff_ = state[j];
}
}

if (v_.size() <= x_itm.size()) {
coeffItm_[i - 1] =
GeneralLinearLeastSquares(x_itm, y_itm, v_).coefficients();
} else {
// see longstaffschwartzpricer.hpp
coeffItm_[i - 1] = Array(v_.size(), 0.0);
}

if (v_.size() <= x_otm.size()) {
coeffOtm_[i - 1] =
GeneralLinearLeastSquares(x_otm, y_otm, v_).coefficients();
} else {
// see longstaffschwartzpricer.hpp
coeffOtm_[i - 1] = Array(v_.size(), 0.0);
}
}


and does nothing more than setting up the two data sets on which we estimate the regression models.

The coefficients are read in the monte carlo engine and serve as the essential part in the proxy object which can then be passed to the proxy engine. And we are done. You can find the full example code here.

# Adjoint Greeks IV – Exotics

Today I feature a sequel to the adjoint greeks drama (see my prior posts on this).

Before I start I would like to point you to a new and excellent blog authored by my colleague Matthias https://ipythonquant.wordpress.com/. You will want to follow his posts, I am certain about that.

I am still on my way to convert the essential parts of the library to a template version. Since this is boring work I created a small friend who helps me. No I did not go mad. It is a semi-intelligent emacs-lisp script that does some regular expression based search-and-replacements which speeds up the conversion a lot. Emacs is so cool.

Today I am going to calculate some derivatives of a bermudan swaption in the Gsr / Hull White model. This is the first post-vanilla application and admittedly I am glad that it works at last.

One point I have to make today is that AD is slow. Or to put it differently, doubles can be tremendously fast. Lookit here:

void multiply(T *a, T *b, T *s) {
for (int i = 0; i < N; ++i) {
for (int k = 0; k < N; ++k) {
for (int j = 0; j < N; ++j) {
s[i * N + j] += a[i * N + k] * b[k * N + j];
}
}
}
}


This code multiplies two matrices a and b and stores the result in s. It is the same algorithm as implemented in QuantLib for the Matrix class.

When feeding two 1000 x 1000 double matrices the code runs 1.3s on my laptop if compiled with gcc 4.8.2 using -O1. With -O3 I get 950ms. With clang 3.7.0 (fresh from the trunk) I get 960ms (-O1) and 630ms (-O3). With T=CppAD::AD on the other hand the running time goes up to at least (-O3) 10.9s (gcc) and 14.3s (clang). Code optimization seems to be a delicate business.

These timings refer to the situation where we use the AD type without taping, i.e. only as a wrapper for the double in it. This seems to indicate that at least for specific procedures it is not advisable to globally replace doubles by their active counterpart if one is not really interested in taping their operations and calculating the derivatives. Performance may break down dramatically.

What is the reason behind the huge difference ? I am not really the right person to analyze this in great detail. The only thing I spotted when looking into the generated assembler code is that with double there are SIMD (single instruction multiple data) instructions for adding and multiplying in the nested loops (like addpd and mulpd, it’s a long time since I programmed in assembler and it was on a 6510 so I am not really able to read a modern x86 assembler file).

With CppAD::AD there doesn’t seem to be such instructions around. So part of the perfomance loss may be due to the inability of streaming AD calculations. For sure this is not the only point here.

The second point to make today is that AD has pitfalls that may in the end lead to wrong results, if one uses the AD framework blindly. Let’s come back to our specific example. The underlying source code can be found here if you are interested or want to run it by yourself.

It is a bermudan swaption, ten years with yearly exercise dates. The model for pricing will be the Gsr or Hull White model. We just want to compute the bucket vegas of the bermudan, i.e. the change in its NPV when the implied market volatility of the canonical european swaptions used for the model calibration is increased by one percent.

The rate level is set to $3\%$ and the strike is out of the money at $5\%$. The volatilities are produced by SABR parameters $\alpha=3\%$, $\beta=60\%$, $\nu=12\%$, $\rho=30\%$. All of this is arbitrary, unrealistic and only for explanatory purposes …

The naive AD way of doing this would be to declare the input implied volatilities as our quantities of interest

   CppAD::Independent(inputVolAD);


and then go through the whole model calibration

   gsrAD->calibrateVolatilitiesIterative(


and pricing

   yAD[0] = swaptionAD->NPV();


to get the derivative d bermudan / d impliedVol

    CppAD::ADFun<Real> f(inputVolAD, yAD);
vega = f.Reverse(1, w);


When I first wrote about AD I found it extremely attractive and magical that you could compute your sensitivities like this. That you can actually differentiate a zero search (like in the case of yield curve bootstrapping) or an optimization (like the Levenberg-Marquardt algorithm which is used here).

However there are dark sides to this simplicity, too. Performance is one thing. The calibration step and the pricing including the gradient calculation takes

AD model calibration = 1.32s


We can also do it differently, namely calibrate the model in an ordinary way (using ordinary doubles), then compute the sensitivity of the bermudan to the model’s sigma and additionally compute the sensitivity of the calibration instruments to the model’s sigma. Putting everything together yields the bermudan’s bucketed vega again. I will demonstrate how below. First I report the computation time for this approach:

model calibration = 0.40s


This leaves us with a performance gain of around 15 percent (7.32s vs 8.43s). This is not really dramatic, still significant. And there is another good reason to separate the calibration from the greek calculation which I will come to below.

Note also, that the pricing which takes 5.95s with AD (including derivatives) is much faster without AD where it only consumes 0.073s. This is a factor of 80 which is much worse than the theoretical factor of 4 to 5 mentioned in earlier posts (remember, we saw 4.5 for plain vanilla interest rate swap npv and delta computation). This is again due to optimization issues obviously.

The background here is that the swaption pricing engine uses cubic spline interpolation and closed-form integration of the resulting cubic polynominals against the normal density for the roll back. Again a lot of elementary calculations, not separated by OO-things that would hinder the compiler from low level optimizations. You would surely need quite a number of sensitivities to still get a performance gain.

But lets follow the path to compute the bucket vegas further down. First I print the bucket vegas coming from the naive AD way above – direct differentiation by the input implied volatilities with the operation sequence going through the whole model calibration and the pricing.

vega #0: 0.0153%
vega #1: 0.0238%
vega #2: 0.0263%
vega #3: 0.0207%
vega #4: 0.0209%
vega #5: 0.0185%
vega #6: 0.0140%
vega #7: 0.0124%
vega #8: 0.0087%


This looks plausible. We started the calibration from a constant sigma function at $1\%$. This is actually far away from the target value (which is around $0.40\%$), so the optimizer can walk around before he settles at the minimum. But we could have started with a sigma very close to the optimal solution. What would happen then ? With the target value as an initial guess (which is unlikely to have in reality, sure) we get

vega #0: 0.0000%
vega #1: 0.0238%
vega #2: 0.0263%
vega #3: 0.0207%
vega #4: 0.0209%
vega #5: 0.0448%
vega #6: 0.0000%
vega #7: 0.0000%
vega #8: 0.0000%


Some vegas are zero now. vega #5 is even completely different from the value before. This is a no go, because in a productive application you wouldn’t notice if some deals are contributing a zero sensitivity or a false value.

What is happening here is that the function we differentiate depends on more input variables than only the primary variables of interest (the implied vol), like the initial guess for the optimization. These might alter the derivative drastically even if the function value (which is the model’s sigma function on an intermediate level, or ultimately the bermudan’s npv) stays the same.

For example if the initial guess is so good that the optimizers tolerance is already satisfied with it, the output will stay the same, no matter if the input is pertubed by an infinitesimal shift $dx$. Here pertubed does not really mean bumped and $dx$ really is infinitesimal small. It is only a concept to get a better intuition of what is going on during the process of automatic differentiation.

Another example is the bisection method for zero searching. This method will always yield zero AD derivatives in the sense if $x$ is an input (e.g. a vector of swap quotes) and $y$ is the output (e.g. a vector of zero rates) linked by a relation

$f(x,y) = 0$

the iff $x$ is pertubed by $dx$ then the checks in the bisection algorithm will yield exactly the same results for $x+dx$ as for $x$. Therefore the computed $y$ will be exactly the same, thus the derivative zero.

I don’t know if this explanation is good, but this is how I picture it for myself. It seems to be like this: If it feels too magical what you do with AD, you better don’t do it.

What is the way out ? We just avoid all optimization and zero searches, if possible. And it is for interest rate deltas and vegas: Just calibrate your curve in the usual way, then apply AD to compute sensitivities to zero rates (which then does not involve a zero search any more).

If you want market rate deltas, compute the Jacobian matrix of the market instruments in the curve by the zero rates as well, invert it and multiply it with the zero delta vector. You do not even have to invert it, but just only have to solve one linear equation system. We make this procedure explicit with our example above.

The first step would be to compute d bermudan NPV / d model’s sigma. This is done on the already calibrated model. Technically we separate the calibration step (done with the usual doubles) and the AD step (done on a copy of the model feeded with the model parameters from the first model, but not itself calibrated again). This is what we get for d bermudan / d sigma

vega #0: 1.6991
vega #1: 1.2209
vega #2: 0.8478
vega #3: 0.5636
vega #4: 0.4005
vega #5: 0.2645
vega #6: 0.1573
vega #7: 0.0885
vega #8: 0.0347


Next we compute the Jacobian d helperNpv / d sigma

297.4%	0.0%	0.0%	0.0%	0.0%	0.0%	0.0%	0.0%	0.0%
179.7%	183.3%	0.0%	0.0%	0.0%	0.0%	0.0%	0.0%	0.0%
126.9%	129.4%	132.2%	0.0%	0.0%	0.0%	0.0%	0.0%	0.0%
90.8%	92.6%	94.6%	96.7%	0.0%	0.0%	0.0%	0.0%	0.0%
64.9%	66.2%	67.6%	69.1%	71.2%	0.0%	0.0%	0.0%	0.0%
47.5%	48.4%	49.5%	50.6%	52.1%	53.3%	0.0%	0.0%	0.0%
31.9%	32.6%	33.3%	34.0%	35.0%	35.9%	36.3%	0.0%	0.0%
19.2%	19.6%	20.0%	20.4%	21.0%	21.5%	21.8%	22.3%	0.0%
8.7%	8.9%	9.0%	9.2%	9.5%	9.8%	9.9%	10.1%	10.5%


helperVega = f.Jacobian(sigmas);


This is a lower triangular matrix, because the ith calibration helper depends only on the sigma function up to its expiry time. The inverse of this matrix is also interesting, although we wouldn’t need it in its full beauty for our vega calculation, representing d sigma / d helperNpv

33.6%	0.0%	0.0%	0.0%	0.0%	0.0%	0.0%	0.0%	0.0%
-33.0%	54.6%	0.0%	0.0%	0.0%	0.0%	0.0%	0.0%	0.0%
0.0%	-53.4%	75.6%	0.0%	0.0%	0.0%	0.0%	0.0%	0.0%
0.0%	0.0%	-74.0%	103.5%	0.0%	0.0%	0.0%	0.0%	0.0%
0.0%	0.0%	0.0%	-100.4%	140.4%	0.0%	0.0%	0.0%	0.0%
0.0%	0.0%	0.0%	0.0%	-137.2%	187.5%	0.0%	0.0%	0.0%
0.0%	0.0%	0.0%	0.0%	0.0%	-185.1%	275.3%	0.0%	0.0%
0.0%	0.0%	0.0%	0.0%	0.0%	0.0%	-269.1%	448.1%	0.0%
0.0%	0.0%	0.0%	0.0%	0.0%	0.0%	0.0%	-431.6%	953.1%


This says how the different sigma steps go up when the helper belonging to the step goes up (these are the positive diagonal elements) and how the next sigma step would need to go down when the same helper as before goes up, but the next helper stays the same. The contra-movement is roughly by the same amount.

Next we need d bermudan / d sigma. AD delivers

vega #1: 1.6991
vega #2: 1.2209
vega #3: 0.8478
vega #4: 0.5636
vega #5: 0.4005
vega #6: 0.2645
vega #7: 0.1573
vega #8: 0.0885
vega #9: 0.0347


And the last ingredient would be the calibration helpers’ vegas, which can be computed analytically (this is the npv change when shifting the input market volatility by one percent up), d helperNpv / d marketVol

vega #1: 0.0904%
vega #2: 0.1117%
vega #2: 0.1175%
vega #3: 0.1143%
vega #4: 0.1047%
vega #5: 0.0903%
vega #6: 0.0720%
vega #7: 0.0504%
vega #8: 0.0263%


Now multiplying everything together gives

d bermudan / d impliedVol = d bermudan / d sigma x d sigma / d helperNpv x d helperNpv / d impliedVol

which is

vega #0: 0.0153%
vega #1: 0.0238%
vega #2: 0.0263%
vega #3: 0.0207%
vega #4: 0.0209%
vega #5: 0.0185%
vega #6: 0.0140%
vega #7: 0.0124%
vega #8: 0.0087%


same as above. Partly because I just copy-pasted it here. But it actually comes out of the code also, just try it.