Copulas and insurance law reform

Storm models are crucial to law reform.  One needs them to get a sense if premiums are reasonable.  And, as I want to show in a series of blog posts, they can also help figure out the effect of legally mandated changes to the insurance contract.  You need to tie behavior at the level of the individual policyholder to the long term finances of the insurer. How would, for example, changing the required deductible on windstorm policies issued by the Texas Windstorm Insurance Association affect the precautions taken by policyholders to avoid storm damage?  That’s important for many reasons, among them that it affects the sustainability of TWIA. Might the imposition of coinsurance into the insurance contract do a better job of making TWIA sustainable?  These are the kind of questions for which a decent storm model is useful.

So, over the past few weeks I’ve been thinking again about ways in which one could, without access (yet) to gigabytes of needed data, develop approximations of the windstorm damage events likely to be suffered by policyholders.  And I’ve been thinking about ways in which one could parameterize those individual damages as a function of the level of precautions taken by policyholders to avoid damage.

What I’m going to present here is a model of storm damage that attempts to strike a reasonable balance of simplicity and fidelity. I’m afraid there’s a good bit of math involved, but I’m going to do my best here to clarify the underlying ideas and prevent your eyes from glazing over.  So, if you’ll stick with me, I’ll do my best to explain.  The reward is that, at the end of the day, we’re going to have a model that in some ways is better than what the professionals use.  It not only explains what is currently going on but can make predictions about the effect of legal change.

Let’s begin with two concepts: (1) “claim prevalence” and (2) “mean scaled claim size.”  By “claim prevalence,” which I’m going to signify with the Greek letter [latex]\nu[/latex] (nu), I mean the likelihood that, in any given year, a policyholder will file a claim based on an insured event. Thus, if in a given year 10,000 of TWIA’s 250,000 policyholders file a storm damage claim, that year’s prevalence is 0.04.  “Mean scaled claim size,” which I’m going to signify with the Greek letter [latex]\zeta[/latex] (zeta), is a little more complicated. It refers to the mean of the size of claims filed during a year divided by the value of the property insured for all properties on which claims are filed during a year.  To take a simple example, if TWIA were to insure 10 houses and, in a particular year, and 2 of them filed claims ([latex]\nu =0.2[/latex]) for $50,000 and for $280,000, and the insured values of the property were $150,000 and $600,000 respectively, the mean scaled claim size [latex]\zeta[/latex] would be 0.4.  That’s because: [latex]0.4=\frac{50}{2\ 150000}+\frac{280000}{2\ 600000}[/latex].

Notice, by the way, that [latex]\zeta \times \nu[/latex] is equal to aggregate claims in a year as a fraction of total insured value.  Thus, if [latex]\zeta \times \nu = 0.005[/latex] and the total insured value is, say, $71 billion, one would expect $355 million in claims in a year. I’ll abbreviate this ratio of aggregate claims in a year to total insured value as [latex]\psi[/latex] (psi).  In this example, then,  [latex]\psi=0.005[/latex].[1]

The central idea underlying my model is that claim prevalence and mean scaled claim size are positively correlated. That’s because both are likely to correlate positively with the destructive power of the storms that occurred during that year.  The correlation won’t be perfect.  A tornado, for example, may cause very high mean scaled claim sizes (total destruction of the homes it hits) but have a narrow path and hit just a few insured properties.  And a low grade tropical storm may cause modest levels of wind damage among a large number of insureds.  Still, most of the time, I suspect, bigger stoms not only cause more claims, but they increase the size of the scaled mean claim size.

copula distribution provides a relatively simple way of blending correlated random variables together.  There are lots of explanations: Wikipedia, a nice paper on the Social Science Research Network, and the Mathematica documentation on the function that creates copula distributions.   There are lots of ways of doing this blending, each with a different name.  I’m going to stick with a simple copula, however, the so-called “Binormal Copula” (a/k/a the “Gaussian Copula.”) with a correlation coefficient of 0.5.[2]

To simulate the underlying distributions, I’m going to use a two-parameter beta distribution for both claim prevalence mean scaled claim size. My experimentation suggests that, although there are probably many alternatives, both these distributions perform well in predicting the limited data available to me on these variables. They also benefit from modest analytic tractability. For people trying to recreate the math here, the distribution function of the beta distribution is [latex]I_x\left(\left(\frac{1}{\kappa ^2}-1\right) \mu ,\frac{\left(\kappa ^2-1\right) (\mu -1)}{\kappa ^2}\right)[/latex], where [latex]\mu[/latex] is the mean of the distribution and [latex]\kappa[/latex] is the fraction (0,1) of the maximum standard deviation of the distribution possible given the value of   [latex]\mu[/latex]. What I have found works well is to set [latex]\mu _{\nu }=0.0244[/latex], [latex]\kappa _{\nu }=0.274[/latex] for the claim prevalence distribution and [latex]\mu _{\zeta }=0.097[/latex], [latex]\kappa _{\zeta }=0.229[/latex] for the mean scaled claim size distribution. This means that policyholders will file a claim about every 41 years and that the value of claims for the year will, on average, be 9.7% of the insured value of the property.[3]

We can visualize this distribution in a couple of ways.  The first is to show a probability density function of the distribution but to scale the probability logarithmically.  This is shown below.

PDF of sample copula distribution

PDF of sample copula distribution

The second is to simulate 10,000 years worth of experience and to place a dot for each year showing claim prevalence and mean scaled claim size.  That is done below. I’ve annotated the graphic with labels showing what might represent a year in which there was a tornado outbreak, a catastrophic hurricane, a tropical storm as well as the large cluster of points representing years in which there was minimal storm damage.

Claim prevalence and mean scaled claim size for 10,000 year simulation

Claim prevalence and mean scaled claim size for 10,000 year simulation

Equipped with our copula, we can now generate losses at the individual policyholder level for any given year.  The idea is to create a “parameter mixture distribution” using the copula. As it turns out, one component of this parameter mixture distribution is itself a mixture distribution.

Dear reader, you now have a choice.  If you like details, have a little bit of a mathematical background and want to understand better how this model works, just keep reading at “A Mini-Course on Mixture and Parameter Mixture Distributions.”  If you just want the big picture, skip to “Simulating at the Policyholder Level” below.

A Mini-Course on Mixture and Parameter Mixture Distributions

To fully understand this model, we need some understanding of a mixture distribution and a parameter mixture distribution.  Let’s start with the mixture distribution, since that is easier.  Imagine a distribution in which you first randomly determine which underlying component distribution you are going to use and then you take a draw from the selected underlying component distribution.  You might, for example, roll a conventional six-sided die, which is a physical representation of what statisticians call a “discrete uniform distribution.”  If the die came up 5 or 6, you then draw from a beta distribution with a mean of 0.7 and a standard deviation of 0.3 times the maximum.  But if the die came up 1 through 4, you would draw from a uniform distribution on the interval [0,0.1].  The diagram below shows the probability density function of the resulting mixture distribution (in red) and the underlying components in blue.

Mixture Distribution with beta and uniform components

Mixture Distribution with beta and uniform components

The mixture distribution has a finite number of underlying component distributions and has discrete weights that you select. The parameter mixture distribution can handle both infinite underlying component distributions and handles weights that are themselves draws from a statistical distribution. Suppose we create a continuous function [latex]f[/latex] that takes a parameter [latex]x[/latex] and creates triangular distribution which has a mean of [latex]x[/latex] and extends 1/4 in each direction from the mean.  We will call this triangular distribution the underlying distribution of the parameter mixture distribution.  The particular member of the triangular distribution family used is determined by the value of the parameter. And, now, we want to create a “meta distribution” — a parameter mixture distribution — in which the probability of drawing a particular parameter [latex]x[/latex] and in turn getting that kind of triangular distribution with mean [latex]x[/latex] is itself determined by another distribution, which I will call [latex]w[/latex]. The distribution [latex]w[/latex] is the weighting distribution of the parameter mixture distribution. To make this concrete, suppose [latex]w[/latex] is a uniform distribution on the interval [0,1].

The diagram below shows the result.  The blue triangular underlying distributions represent a sample of the probability density functions of triangular distributions.  There are actually an infinite number of these triangular distributions, but obviously I can’t draw them all here. Notice that some of the density functions are more opaque than others. The opacity of each probability density function is based on the probability that such a distribution would be drawn from [latex]w[/latex].  The red line shows the probability density function of the resulting parameter mixture distribution.  It is kind of an envelope of these triangular distributions.

Parameter mixture distribution for triangular distributions where mean of triangular distributions is drawn from a uniform distribution

Parameter mixture distribution for triangular distributions where mean of triangular distributions is drawn from a uniform distribution

We can combine mixture distributions and parameter mixture distributions.  We can have a mixture distribution in which one or more of the underlying functions is a parameter mixture distribution.  And, we can have a parameter mixture distribution in which either the underlying function and/or the weighting function is a mixture distribution.

It’s that combination — a parameter mixture distribution in which the underlying function is a mixture distribution — that we’re going to need to get a good simulation of the damages caused by storms. The weighting distribution of this parameter mixture distribution is our copula. It throws out two parameters:  (1) [latex]\nu[/latex], the likelihood that in any given year the policyholder has a non-zero claim and (2) [latex]\zeta[/latex] the mean scaled claim size assuming that the policyholder has a non-zero claim.  Those two parameters are going to weight members of the underlying distribution, which is a mixture distribution.  The weights of the mixture distribution are the the likelihood that the policyholder has no claim and the likelihood that the policyholder has a non-zero claim (claim prevalence). The component distributions of the mixture distribution are (1) a distribution that always produces zero and (2) any distribution satisfying the constraint that its mean is equal to the mean scaled claim size.  I’m going to use another beta distribution for this latter purpose with a standard deviation equal to 0.2 of the maximum standard deviation.  I’ll denote this distribution as B. Some examination of data from Hurricane Ike is not inconsistent with the use of this distribution and the distribution has the virtue of being analytically tractable and relatively easy to compute.

This diagram may help understand what is going on.

The idea behind the parameter mixture distribution

The idea behind the parameter mixture distribution

Simulating at the Policyholder Level

So, we can now simulate a large insurance pool over the course of years by making, say, 10,000 draws from our copula.  And from each draw of the copula, we can determine the claim size for each of the policyholders insured in that sample year. Here’s an example.  Suppose our copula produces a year with some serious damage: claim prevalence value of 0.03 and a mean scaled claim size of 0.1 for the year.  If we simulate the fate of 250,000 policyholders, we find that 242,500 have no claim.  The graphic below shows the distribution of scaled claim sizes among those who did have a non-zero claim.

Scaled claim sizes for sample year

Scaled claim sizes for sample year

Fortunately, however, we don’t need to sample 250,000 policy holders each year for 10,000 years to get a good picture of what is going on.  We can simulate things quite nicely by looking at the condition of just 2,500 policyholders and then just multiplying aggregate losses by 100.  The graphic below shows a logarithmic plot of aggregate losses assuming a total insured value in the pool of $71 billion (which is about what TWIA has had recently).

Aggregate losses (simulated) on $71 billion of insured property

Aggregate losses (simulated) on $71 billion of insured property

We can also show a classical “exceedance curve” for our model.  The graphic below varies the aggregate losses on $71 billion of insured property and shows, for each value, the probability (on a logarithmic scale) that losses would exceed that amount.  One can thus get a sense of the damage caused by the 100 year storm and the 1000-year storm.  The figures don’t perfectly match TWIA’s internal models, but that’s simply because our parameters have not been tweaked at this point to accomplish that goal.

Exceedance curve (logarithmic) for sample 10,000 year run

Exceedance curve (logarithmic) for sample 10,000 year run

The final step is to model how extra precautions by a policyholder might alter these losses.  Presumably, precautions are like most economic things: there is a diminishing marginal return on investment.  So, I can roughly model matters by saying that for a precaution of x the insured results in the insured drawing from a new beta distribution with a mean equal to [latex]\ell \times 2^{-x}[/latex], where [latex]\ell[/latex] is the amount of damage they would have suffered had they taken no extra precautions. (I’ll keep the standard deviation of this beta distribution equal to 0.2 of its maximum possible value.) I have thus calibrated extra precautions such that each unit of extra precautions cuts the mean losses in half. It doesn’t mean that sometimes precautions won’t result in greater savings or that sometimes precautions won’t result in lesser savings; it just means that on average, each unit of precautions cuts the losses in half.

And, we’re done!  We’ve now got a storm model that when combined with the model of policyholder behavior that I will present in a future blog entry, should give us respectable predictions on the ability of insurance contract features such as deductibles and coinsurance to alter aggregate storm losses. Stay tuned!

Footnotes

[1] As I recognized a bit belatedly in this project, if one makes multiple draws from a copula distribution, it is not the case that the mean of the product of the two values [latex]\nu[/latex] and [latex]\zeta[/latex] drawn from the copula is equal to [latex]\nu \times \zeta[/latex]. You can see why this might be by imagining a copula distribution in which the two values were perfectly correlated, in which case one would be drawing from a distribution transformed by squaring.  It is not the case that the mean of such a transformed distribution is equal to the mean of the underlying distribution.

[2] Copulas got a bad name over the past 10 years for bearing some responsibility for the financial crisis..  This infamy, however, has nothing to do with the mathematics of copulas, which remains quite brilliant, but with their abuse and the fact that incorrect distributions were inserted into the copula.

[3] We thus end up with a copula distribution whose probability density function takes on this rather ghastly closed form.  (It won’t be on the exam.)

[latex]\frac{(1-\zeta )^{\frac{1-\mu _{\zeta }}{\kappa _{\zeta }^2}+\mu _{\zeta }-2} \zeta ^{\left(\frac{1}{\kappa _{\zeta }^2}-1\right) \mu _{\zeta }-1} (1-\nu )^{\frac{1-\mu _{\nu }}{\kappa _{\nu }^2}+\mu _{\nu }-2} \nu ^{\left(\frac{1}{\kappa _{\nu }^2}-1\right) \mu_{\nu }-1} \exp \left(\frac{\left(\text{erfc}^{-1}\left(2 I_{\zeta }\left(\left(\frac{1}{\kappa _{\zeta }^2}-1\right) \mu _{\zeta },\frac{\left(\kappa _{\zeta }^2-1\right) \left(\mu _{\zeta }-1\right)}{\kappa _{\zeta }^2}\right)\right)-\rho

\text{erfc}^{-1}\left(2 I_{\nu }\left(\left(\frac{1}{\kappa _{\nu }^2}-1\right) \mu _{\nu },\frac{\left(\kappa _{\nu }^2-1\right) \left(\mu _{\nu }-1\right)}{\kappa _{\nu }^2}\right)\right)\right){}^2}{\rho ^2-1}+\text{erfc}^{-1}\left(2 I_{\zeta
}\left(\left(\frac{1}{\kappa _{\zeta }^2}-1\right) \mu _{\zeta },\frac{\left(\kappa _{\zeta }^2-1\right) \left(\mu _{\zeta }-1\right)}{\kappa _{\zeta }^2}\right)\right){}^2\right)}{\sqrt{1-\rho ^2} B\left(\left(\frac{1}{\kappa _{\zeta }^2}-1\right) \mu _{\zeta
},\frac{\left(\kappa _{\zeta }^2-1\right) \left(\mu _{\zeta }-1\right)}{\kappa _{\zeta }^2}\right) B\left(\left(\frac{1}{\kappa _{\nu }^2}-1\right) \mu _{\nu },\frac{\left(\kappa _{\nu }^2-1\right) \left(\mu _{\nu }-1\right)}{\kappa _{\nu }^2}\right)}[/latex]


 

 

The still-mysterious variation in private market penetration along the Texas coast

There’s this fact that stares you in the face as you try to figure out whether, as I have hoped, private insurers might significantly displace the system of coastal windstorm insurance in Texas currently dominated by TWIA. It’s that the private market appears to be alive and well in parts of the Texas coast. In Cameron County, for example, TWIA has only 31% of the residential market. And in Kleberg County, the figure is 27%. On the other hand in Galveston County, TWIA owns 77% of the residential market and 81% in Aransas County. What accounts for this variation? Maybe if we could figure it out, we could engineer some policies likely to induce the private market to re-enter to a greater extent throughout the coast.

I will save you the trouble of reading ahead. I didn’t find much. The variation remains pretty much of a mystery. I look forward to suggestions for further experimentation or someone who will just reveal the obvious answer.

If anyone, by the way, has data on the proportion of property or the population that is located within some distance of the actual coastline within each county, I’d be very interested in seeing that.  Maybe the reason the geographic data isn’t showing anything is that the county divisions are too coarse.  If, for example, Galveston County has a higher proportion of its population living close to the ocean than does, say, Kleberg County and if insurers don’t feel, for some reason, that they can underwrite within counties, that might provide some better explanation for the variation in private insurer participation in the Texas coastal windstorm market.

For those who care how I came by my “negative result” — just the sort of thing many academic journals tend to disdain — I offer the following brief synopsis.

If you just look at a map, no particular pattern appears.

What if we look at some data? I grabbed data on the TWIA counties I thought might possibly be relevant from the United States census. Maybe population density is important on the thought that the more dense the county, the more correlated the risk and the less private insurers would want to write there. Or maybe private insurers have greater (or lesser) marketing power in densely populated counties. I grabbed median income data on the thought that private insurers might prefer to write policies in wealthier counties. I grabbed ethnicity data on the thought that race and ethnicity often matter in modern America — not necessarily causally but because race and ethnicity end up correlating with things that matter. We end up with 14 data points and 3 dependent variables. There’s not a huge amount one can do with data sets this small, but I thought I’d give it a try.

If one does a simple-minded logit regression, one ends up with the following somewhat unusual result. With these three variables, we end up accounting for about 72% of the variation in the data, but no single variable is statistically significant, or even close.

Logit Model Fit of TWIA Market Share

We can also try something more sophisticated. Instead of just assuming a logistic linear relationship between the independent variables and the dependent one (TWIA penetration), we can ask the computer to explore a huge space of potential models and see if anything turns up. Such statistical work used to be impracticable without super computers due to the amount of computation involved and the custom programming required. It’s now eminently possible on an average desktop with software such as DataModeler from Evolved Analytics.  Although this process yields remarkable gains in understanding a system, such is not always the case.  And, for this small dataset, exploring a much larger model space leaves us with a number of models that have somewhat higher R-squared values that our logistic regression, but nothing to truly brag about and none that clearly point towards one or another of the variables in our model as being critical.

Sample results of Data Modeler predicting TWIA penetration

I thus end up saying that, for now, the mystery of varying market penetration remains unsolved.

The curious case of Corpus Christi

Today’s Corpus Christi Caller has an interesting article that purports to show a special immunity of the Corpus Christi area to hurricane risk, which is said to be no more than that facing New York City. The article is based on a report from NOAA published since 2010 and apparently brought to the recent attention of Todd Hunter, Corpus Christi’s state representative. It’s based on data from 1887 forwards that attempts to calibrate the comparative risk of landfall both within Texas and throughout the Gulf and Eastern Seaboard.

Here’s the key picture which, though not shown in the report, appears to underlie the article’s conclusions and quotations.

Return periods of Atlantic hurricanes

Return periods of Atlantic hurricanes by county

See the blue 19 next to Corpus Christi and the blue 20 next to New York City. This is supposed to show that the risk of hurricanes in those two regions are similar: one every 19 or 20 years a hurricane will strike within 50 miles. And see the orange 9s next to Galveston and Brazoria counties. That is supposed to show that the risk of hurricanes in those two regions are greater, once every 9 years.

The evidence gets a bit more complicated, however, if one looks at the next picture in the NOAA document, one not mentioned in the Caller article. It shows the history of major hurricanes based on historic evidence from 1887 to 2010. Although the coastal bend (33-40 years) still comes out better than the east Texas coast (25-26 years), the ratio isn’t as great as for all hurricanes. Moreover, the comparison with New York City now fails. The Big Apple gets hit only once every 68 years.

Major hurricane return periods

Return period for major Atlantic hurricanes by county

So, what are we to make of all this? I would say not too much. What the NOAA report lacks is any notion of statistical significance that would make it particularly useful in drawing fine grained distinctions between areas of the Texas coast. It might just be that what the pictures show is significantly good and bad luck. Drawn from a sample of just 130 years or so, one might expect to see distributions of return periods that varied from county to county. Perhaps some trends might be observable, such as greater strike frequency in Florida than Texas, but what the report lacks is a “p-value,” the probability that one would see variations in the data as large as those exhibited in the graphics simply as a matter of chance. I’m not faulting NOAA for this; it would be very hard to develop such a statistic and it was purporting to capture historic evidence only. Moreover, our climate is dynamic. Storm tracks and storm frequency can change as a result of global weather phenomenon. Thus, while one should not ignore historic data, you have to be very careful about projecting it into the future or using it to make highly specific projections.

So, should the report be ignored? No. Perhaps curious atmospheric features (jet stream placement) and geographic features such as the placement of Cuba indeed give Corpus Christi a little shield. And if Corpus Christi wants to argue on that basis for lower rates for southwest coastal Texas and higher rates for the eastern Texas coast, I wouldn’t be mightily opposed. Somehow, however, I don’t think that’s where coastal Texas wants to go in the upcoming legislative session. Recognition of large differences based on geography in catastrophe risk isn’t the best basis on which to plead risk socialization and rate uniformity. (More on that point soon!)

A 1.5-2% risk per year of losing your home with inadequate insurance is a serious risk

For the 2011-12 hurricane season, the Texas Windstorm Insurance Association managed to purchase $636 million in reinsurance coverage for a net cost of about $83 million.  As a result of this purchase, having about $150 million in a piggybank, and the legality of TWIA borrowing about $2.5 billion following a serious loss, this means that TWIA had — roughly $3.2 billion — available to pay claims.  TWIA admits and my own computations based on the Weibull Distribution confirm that this leaves a 1.5-2% chance that TWIA, even with a lot of borrowing, will not have enough money to honor its obligations in full.

Here’s a picture of the TWIA funding stack.

TWIA Funding Stack

One and a half to two percent may not sound that awful.  That’s what some coastal Texas Representatives such as Todd Hunter are asserting. But their reassurances should not bring much comfort nor deflect attention from the serious problem facing Texas.

First — one and a half to two percent risks in fact occur. The fact that the risk is relatively small does not mean you should not insure against it.  Would you, for example, tell a 65 year old with a family to support not to worry about life insurance for the next year because there was only a 1.5% chance or so of dying during that time period  Would you, for example, tell a homeowner not to worry that their automobile insurance policy did not cover them for, losses during three months of each year because only about 1.5% of homeowners make claims during any three month period? (http://research3.bus.wisc.edu/file.php/129/Papers/PredModelHomeowners21July2010.pdf).  Or, let’s play a game.  Flip a coin six times in a row.  If it comes up all heads, you lose your house.  That’s about a 1.5% risk.  Want to play?  Perhaps you are all more courageous than I am, but I would worry.

Second, although a 1.5-2% risk may be unlikely to occur in any given year, just looking at a one year period is a strange way of thinking about it.  Say you own your home for 10 years or are thinking of investing in a business in Galveston.  If TWIA does not mend its ways, suddenly the risk of TWIA suffering a bankrupting loss during your period of investment jumps to 14-18%.  That’s calculated using something called the binomial distribution. Would you worry that if you rolled a single die and it came up 6 you would lose your house.  Again, maybe some politicians are particularly courageous, but I would be concerned.

Third, the 1.5-2% risk of TWIA going insolvent in any year is not the only risk created by the current funding structure.  There is something like a 15% chance that the next year will bring a storm large enough to force TWIA to borrow.  And the way TWIA first pays that money back is first by raising premiums on TWIA policyholders, pure and simple. The statute I’ve set forth below (section 2210.612 of the Texas Insurance Code makes that clear).  If we expand to our 10 year time horizon, the probability that TWIA will have to borrow goes into the 70% range.

Section 2210.612 of the Texas Insurance Code

And it gets worse.  If TWIA raises premiums substantially to pay off these “Class 1 Public Securities,”  some people will drop out of TWIA and find alternatives.  This means the rates TWIA will need to charge go up even more.  And more people drop out.  Reverse funding insurance creates a risk that TWIA will unravel — a risk lenders will surely take into account in figuring out what interest rate to charge TWIA in the event it has to borrow.

Actually, it gets yet worse, but I will save that and one other matter for other posts.

It’s a Weibull

To understand the premiums charged by the Texas Windstorm Insurance Association and the current legal and financial issues being debated in Austin, you have to get your hands a little dirty with the actuarial science.  You need to have some ability to model the damages likely to be caused by a tropical cyclone on the Texas coast.  Now, to do this really well, it might be thought you need an awful lot of very fine data.  In fact, however, you can do a pretty good job of understanding TWIA’s perspective by just reverse engineering publicly available information.

What I want to show is that the perceived annual exposure to the Texas Windstorm Association can be really well modeled by something known in statistics as a Weibull Distribution. To be fancy, it’s a zero-censored three parameter Weibull Distribution: 

CensoredDistribution[{0, ∞},
 WeibullDistribution[0.418001, 1.26765*10^8, -4.81157*10^8]]

We can plot the results of this distribution against the predictions made by TWIA’ s two consultants: AIR and RMS. The x-axis of the graph are the annual losses to TWIA.  The y-axis of the graph is the probability that the losses will be less than or equal to the corresponding amount on the x-axis. As one can see, it is almost a perfect fit.  For statisticians, the “adjusted R Squared” value is 0.995. 

Image

 

How did I find this function? Part of it is some intuition and some expertise about loss functions.  But a lot of it comes from running a “non-linear regression” on data in the public domain.  Here’s a chart (an “exceedance table”) provided by reinsurance broker Guy Carpenter to TWIA.  It shows the estimates of two consultants, AIR and RMS, about the losses likely to be suffered by TWIA.  Basically, you can use statistics software (I used Mathematica) to run a non-linear regression on this data and assume the underlying model is a censored Weibull distribution of some sort.  And, in less than a second, out pop the parameters to the Weibull distribution that best fit the data. As shown above it fits the AIR and RMS data points really well.  Moreover, it calculates the “AAL” (the mean annual loss to TWIA) pretty well too.

 

Image

In some forthcoming posts, I’ m going to show what the importance of this finding is, but suffice it to say, it explains a lot about the current controversy and suggests some matters to be examined with care.