# It’s (close to) a Weibull — again!

You recall that in my last post, I went through an involved process of showing how one could generate storm losses for individuals over years.  That process, which underlies a project to examine the effect of legal change on the sustainability of a catastrophe insurer, involved the copulas of beta distributions and a parameter mixture distribution in which the underlying distribution was also a beta distribution. It was not for the faint of heart.

One purpose of this effort was to generate a histogram that looks like the one below that shows the distribution of scaled claim sizes for non-negligible claims. This histogram was obtained by taking one draw from the copula distribution for each of the $y$ years in the simulation and using it to constrain the distribution of losses suffered by each of the $n$ policyholders in each of those $y$ years.  Thus, although the underlying process created an $y \times n$ matrix, the histogram below is for a single “flattened” $y \times n$ vector of values.

Histogram of individual scaled non-negligible claim sizes

But, if we stare at that histogram for a while, we recognize the possibility that it might be approximated by a simple statistical distribution.  If that were the case, we could simply use the simple statistical distribution rather than the elaborate process for generating individual storm loss distributions. In other words, there might be a computational shortcut that could approximate the elaborate proces.  If that were the case, to get the experience of all $n$ policyholders — including those who did not have a claim at all — we could just upsample random variates drawn from our hypothesized simple distribution and add zeros; alternatively, we could create a mixture distribution in which most of the time one drew from a distribution that was always zero and, when there was a positive claim, one drew from this hypothesized simple distribution.

I tried for a while to find this mysterious hypothesized simple distribution.  To do so, I threw a number of possible distributions at the data reflected in the histogram above — a beta distribution, a lognormal distribution, a gamma distribution, an inverse gamma distribution, an inverse gaussian distribution, a Rayleigh distribution, and a Maxwell distribution.  For each, I found an estimated distribution that was the best member of those families.[1]  Unfortunately, just because a particular distribution is the best member of the family to fit some data does not mean it is plausible to believe that the data actually came from that distribution. The estimation process may just be doing the best it can with a fundamentally flawed hypothesis.  And, when it came time to see whether the data actually fit those best family members, it turned out they did not. [2] Alas.

But then I tried a Weibull Distribution.  I had some optimism that it might work because Weibull Distributions are frequently used in hurricane modeling.  Indeed, as I noted in my very first post in this blog, the aggregate distribution of losses suffered by the Texas Windstorm Insurance Association approximates a member of the Weibull family pretty well. So, if aggregate losses are Weibully maybe the individual losses are Weibully too.

Success! WeibullDistribution[1.30023,0.165684] is the member of the two-parameter Weibull family that fits the data best. Moreover, the null hypothesis that the data is distributed according to that distribution is not rejected at the 5 percent level based on the Cramer-von Mises test.[3] In English, it’s a pretty good fit.

A picture confirms the finding. The blue histogram shows the distribution of scaled (non-negligible) claim sizes generated using the copula algorithm described in the prior post. The red line shows the best Weibull Distribution.  I’d say they match up really, really well.

Comparison of data histogram with best Weibull Distribution

Now, I can’t tell you why exactly the elaborate procedure defined above generates something that looks Weibully. It has been shown that both the beta distribution and the Weibull distribution are special cases of something called the Generalized Beta Distribution of the Second Kind (a/k/a the Beta Prime Distribution).[4]  Weibull is basically an uncle of beta. But exactly why we get this result here, I’m not well versed enough in statistical theory to say. It may just be a coincidence.  Regardless, however, we now have a computationally simple way of generating storm losses on the individual level with some fidelity to reality, and that’s worth a mention.

Footnotes

[1] I used the Mathematica command EstimatedDistribution to perform this analysis.

[2] I used the Mathematica command DistributionFitTest to perform this analysis.

[3] Mathematica’s command DistributionFitTest can, with the proper syntax, not only provide a p-value but can also provide an English language sentence explaining precisely what that p-value means.

[4] The R language has support for this distribution. http://hosho.ees.hokudai.ac.jp/~kubo/Rdoc/library/VGAM/html/genbetaII.html