# It’s (close to) a Weibull — again!

You recall that in my last post, I went through an involved process of showing how one could generate storm losses for individuals over years.  That process, which underlies a project to examine the effect of legal change on the sustainability of a catastrophe insurer, involved the copulas of beta distributions and a parameter mixture distribution in which the underlying distribution was also a beta distribution. It was not for the faint of heart.

One purpose of this effort was to generate a histogram that looks like the one below that shows the distribution of scaled claim sizes for non-negligible claims. This histogram was obtained by taking one draw from the copula distribution for each of the $y$ years in the simulation and using it to constrain the distribution of losses suffered by each of the $n$ policyholders in each of those $y$ years.  Thus, although the underlying process created an $y \times n$ matrix, the histogram below is for a single “flattened” $y \times n$ vector of values.

Histogram of individual scaled non-negligible claim sizes

But, if we stare at that histogram for a while, we recognize the possibility that it might be approximated by a simple statistical distribution.  If that were the case, we could simply use the simple statistical distribution rather than the elaborate process for generating individual storm loss distributions. In other words, there might be a computational shortcut that could approximate the elaborate proces.  If that were the case, to get the experience of all $n$ policyholders — including those who did not have a claim at all — we could just upsample random variates drawn from our hypothesized simple distribution and add zeros; alternatively, we could create a mixture distribution in which most of the time one drew from a distribution that was always zero and, when there was a positive claim, one drew from this hypothesized simple distribution.

# The curious case of Corpus Christi

Today’s Corpus Christi Caller has an interesting article that purports to show a special immunity of the Corpus Christi area to hurricane risk, which is said to be no more than that facing New York City. The article is based on a report from NOAA published since 2010 and apparently brought to the recent attention of Todd Hunter, Corpus Christi’s state representative. It’s based on data from 1887 forwards that attempts to calibrate the comparative risk of landfall both within Texas and throughout the Gulf and Eastern Seaboard.

Here’s the key picture which, though not shown in the report, appears to underlie the article’s conclusions and quotations.

Return periods of Atlantic hurricanes by county

See the blue 19 next to Corpus Christi and the blue 20 next to New York City. This is supposed to show that the risk of hurricanes in those two regions are similar: one every 19 or 20 years a hurricane will strike within 50 miles. And see the orange 9s next to Galveston and Brazoria counties. That is supposed to show that the risk of hurricanes in those two regions are greater, once every 9 years.

The evidence gets a bit more complicated, however, if one looks at the next picture in the NOAA document, one not mentioned in the Caller article. It shows the history of major hurricanes based on historic evidence from 1887 to 2010. Although the coastal bend (33-40 years) still comes out better than the east Texas coast (25-26 years), the ratio isn’t as great as for all hurricanes. Moreover, the comparison with New York City now fails. The Big Apple gets hit only once every 68 years.

Return period for major Atlantic hurricanes by county

So, what are we to make of all this? I would say not too much. What the NOAA report lacks is any notion of statistical significance that would make it particularly useful in drawing fine grained distinctions between areas of the Texas coast. It might just be that what the pictures show is significantly good and bad luck. Drawn from a sample of just 130 years or so, one might expect to see distributions of return periods that varied from county to county. Perhaps some trends might be observable, such as greater strike frequency in Florida than Texas, but what the report lacks is a “p-value,” the probability that one would see variations in the data as large as those exhibited in the graphics simply as a matter of chance. I’m not faulting NOAA for this; it would be very hard to develop such a statistic and it was purporting to capture historic evidence only. Moreover, our climate is dynamic. Storm tracks and storm frequency can change as a result of global weather phenomenon. Thus, while one should not ignore historic data, you have to be very careful about projecting it into the future or using it to make highly specific projections.

So, should the report be ignored? No. Perhaps curious atmospheric features (jet stream placement) and geographic features such as the placement of Cuba indeed give Corpus Christi a little shield. And if Corpus Christi wants to argue on that basis for lower rates for southwest coastal Texas and higher rates for the eastern Texas coast, I wouldn’t be mightily opposed. Somehow, however, I don’t think that’s where coastal Texas wants to go in the upcoming legislative session. Recognition of large differences based on geography in catastrophe risk isn’t the best basis on which to plead risk socialization and rate uniformity. (More on that point soon!)

# It’s a Weibull

To understand the premiums charged by the Texas Windstorm Insurance Association and the current legal and financial issues being debated in Austin, you have to get your hands a little dirty with the actuarial science.  You need to have some ability to model the damages likely to be caused by a tropical cyclone on the Texas coast.  Now, to do this really well, it might be thought you need an awful lot of very fine data.  In fact, however, you can do a pretty good job of understanding TWIA’s perspective by just reverse engineering publicly available information.

What I want to show is that the perceived annual exposure to the Texas Windstorm Association can be really well modeled by something known in statistics as a Weibull Distribution. To be fancy, it’s a zero-censored three parameter Weibull Distribution:

CensoredDistribution[{0, ∞},
WeibullDistribution[0.418001, 1.26765*10^8, -4.81157*10^8]]

We can plot the results of this distribution against the predictions made by TWIA’ s two consultants: AIR and RMS. The x-axis of the graph are the annual losses to TWIA.  The y-axis of the graph is the probability that the losses will be less than or equal to the corresponding amount on the x-axis. As one can see, it is almost a perfect fit.  For statisticians, the “adjusted R Squared” value is 0.995.

How did I find this function? Part of it is some intuition and some expertise about loss functions.  But a lot of it comes from running a “non-linear regression” on data in the public domain.  Here’s a chart (an “exceedance table”) provided by reinsurance broker Guy Carpenter to TWIA.  It shows the estimates of two consultants, AIR and RMS, about the losses likely to be suffered by TWIA.  Basically, you can use statistics software (I used Mathematica) to run a non-linear regression on this data and assume the underlying model is a censored Weibull distribution of some sort.  And, in less than a second, out pop the parameters to the Weibull distribution that best fit the data. As shown above it fits the AIR and RMS data points really well.  Moreover, it calculates the “AAL” (the mean annual loss to TWIA) pretty well too.

In some forthcoming posts, I’ m going to show what the importance of this finding is, but suffice it to say, it explains a lot about the current controversy and suggests some matters to be examined with care.