# Reparameterizing the beta distribution

Just as one can reparameterize the lognormal distribution, which is frequently used in the economic analysis of insurance and, derivatively, insurance regulation, so too with the beta distribution, which is also frequently used.  If $\mu$ is the mean of the beta distribution and $\kappa$ is the fraction from 0 to 1 of the maximum possible standard deviation of the beta distribution given its mean, then one can write a modified beta distribution as follows:

Note: corrects sign error in earlier version.

Reparameterized beta distribution

# The scary Austin American Statesman article

The Austin-American Statesman has published an article based on a July email written by Texas Insurance Commissioner Eleanor Kitzman that the newspaper obtained through a public records request.  I’ve requested a copy of the email in question but I have not received a response yet.  Here’s a link to the article.

There are at least two assertions in the article that will create concerns for many.

1. According to the Statesman, Ms. Kitzman emailed members of her senior staff that she wanted “to schedule some time next week to discuss options for funding a 3-5 year transition to market rates” for coastal counties.  The Statesman then offers two interpretations of this remark: (1) the commissioner might consider trying to force private companies to write risky coastal polices and allowing them to raise rates over the next three to five years; and (2) TWIA might need to substantially raise its own rates on customers — possibly by 45 percent or more.

2. The article purports to illustrate some perils of moving to a private market system.

“… according to a tool on the Texas Department of Insurance’s website, a TWIA policy for a $200,000 house in Nueces County costs about$1,700, while a policy on the same value house would be as much as $8,400 with a private insurer.” # Two quick ideas on the mathematics of risk aversion En route to my forthcoming post on the effects of coinsurance and deductible regulation on optimal precautions against catastrophic risk, I’ve confronted some fundamental problems in comparing the value insured’s ascribe to statistical distributions. I’m going to post about two of those confrontations here. # Exciting stuff coming I know it’s been two weeks since I’ve posted, but I’ve been cooking up a good model to help understand the likely effects of changing deductible and coinsurance requirements on catastrophe insurance policies. I’ve made big progress and should have some genuinely interesting posts coming in the week ahead. Catrisk fans hang in there! Note from November 12, 2012. You can now find the model here. With yet more to come. # It’s (close to) a Weibull — again! You recall that in my last post, I went through an involved process of showing how one could generate storm losses for individuals over years. That process, which underlies a project to examine the effect of legal change on the sustainability of a catastrophe insurer, involved the copulas of beta distributions and a parameter mixture distribution in which the underlying distribution was also a beta distribution. It was not for the faint of heart. One purpose of this effort was to generate a histogram that looks like the one below that shows the distribution of scaled claim sizes for non-negligible claims. This histogram was obtained by taking one draw from the copula distribution for each of the $y$ years in the simulation and using it to constrain the distribution of losses suffered by each of the $n$ policyholders in each of those $y$ years. Thus, although the underlying process created an $y \times n$ matrix, the histogram below is for a single “flattened” $y \times n$ vector of values. Histogram of individual scaled non-negligible claim sizes But, if we stare at that histogram for a while, we recognize the possibility that it might be approximated by a simple statistical distribution. If that were the case, we could simply use the simple statistical distribution rather than the elaborate process for generating individual storm loss distributions. In other words, there might be a computational shortcut that could approximate the elaborate proces. If that were the case, to get the experience of all $n$ policyholders — including those who did not have a claim at all — we could just upsample random variates drawn from our hypothesized simple distribution and add zeros; alternatively, we could create a mixture distribution in which most of the time one drew from a distribution that was always zero and, when there was a positive claim, one drew from this hypothesized simple distribution. # Copulas and insurance law reform Storm models are crucial to law reform. One needs them to get a sense if premiums are reasonable. And, as I want to show in a series of blog posts, they can also help figure out the effect of legally mandated changes to the insurance contract. You need to tie behavior at the level of the individual policyholder to the long term finances of the insurer. How would, for example, changing the required deductible on windstorm policies issued by the Texas Windstorm Insurance Association affect the precautions taken by policyholders to avoid storm damage? That’s important for many reasons, among them that it affects the sustainability of TWIA. Might the imposition of coinsurance into the insurance contract do a better job of making TWIA sustainable? These are the kind of questions for which a decent storm model is useful. So, over the past few weeks I’ve been thinking again about ways in which one could, without access (yet) to gigabytes of needed data, develop approximations of the windstorm damage events likely to be suffered by policyholders. And I’ve been thinking about ways in which one could parameterize those individual damages as a function of the level of precautions taken by policyholders to avoid damage. What I’m going to present here is a model of storm damage that attempts to strike a reasonable balance of simplicity and fidelity. I’m afraid there’s a good bit of math involved, but I’m going to do my best here to clarify the underlying ideas and prevent your eyes from glazing over. So, if you’ll stick with me, I’ll do my best to explain. The reward is that, at the end of the day, we’re going to have a model that in some ways is better than what the professionals use. It not only explains what is currently going on but can make predictions about the effect of legal change. Let’s begin with two concepts: (1) “claim prevalence” and (2) “mean scaled claim size.” By “claim prevalence,” which I’m going to signify with the Greek letter $\nu$ (nu), I mean the likelihood that, in any given year, a policyholder will file a claim based on an insured event. Thus, if in a given year 10,000 of TWIA’s 250,000 policyholders file a storm damage claim, that year’s prevalence is 0.04. “Mean scaled claim size,” which I’m going to signify with the Greek letter $\zeta$ (zeta), is a little more complicated. It refers to the mean of the size of claims filed during a year divided by the value of the property insured for all properties on which claims are filed during a year. To take a simple example, if TWIA were to insure 10 houses and, in a particular year, and 2 of them filed claims ($\nu =0.2$) for$50,000 and for $280,000, and the insured values of the property were$150,000 and $600,000 respectively, the mean scaled claim size $\zeta$ would be 0.4. That’s because: $0.4=\frac{50}{2\ 150000}+\frac{280000}{2\ 600000}$. Notice, by the way, that $\zeta \times \nu$ is equal to aggregate claims in a year as a fraction of total insured value. Thus, if $\zeta \times \nu = 0.005$ and the total insured value is, say,$71 billion, one would expect $355 million in claims in a year. I’ll abbreviate this ratio of aggregate claims in a year to total insured value as $\psi$ (psi). In this example, then, $\psi=0.005$.[1] The central idea underlying my model is that claim prevalence and mean scaled claim size are positively correlated. That’s because both are likely to correlate positively with the destructive power of the storms that occurred during that year. The correlation won’t be perfect. A tornado, for example, may cause very high mean scaled claim sizes (total destruction of the homes it hits) but have a narrow path and hit just a few insured properties. And a low grade tropical storm may cause modest levels of wind damage among a large number of insureds. Still, most of the time, I suspect, bigger stoms not only cause more claims, but they increase the size of the scaled mean claim size. copula distribution provides a relatively simple way of blending correlated random variables together. There are lots of explanations: Wikipedia, a nice paper on the Social Science Research Network, and the Mathematica documentation on the function that creates copula distributions. There are lots of ways of doing this blending, each with a different name. I’m going to stick with a simple copula, however, the so-called “Binormal Copula” (a/k/a the “Gaussian Copula.”) with a correlation coefficient of 0.5.[2] To simulate the underlying distributions, I’m going to use a two-parameter beta distribution for both claim prevalence mean scaled claim size. My experimentation suggests that, although there are probably many alternatives, both these distributions perform well in predicting the limited data available to me on these variables. They also benefit from modest analytic tractability. For people trying to recreate the math here, the distribution function of the beta distribution is $I_x\left(\left(\frac{1}{\kappa ^2}-1\right) \mu ,\frac{\left(\kappa ^2-1\right) (\mu -1)}{\kappa ^2}\right)$, where $\mu$ is the mean of the distribution and $\kappa$ is the fraction (0,1) of the maximum standard deviation of the distribution possible given the value of $\mu$. What I have found works well is to set $\mu _{\nu }=0.0244$, $\kappa _{\nu }=0.274$ for the claim prevalence distribution and $\mu _{\zeta }=0.097$, $\kappa _{\zeta }=0.229$ for the mean scaled claim size distribution. This means that policyholders will file a claim about every 41 years and that the value of claims for the year will, on average, be 9.7% of the insured value of the property.[3] We can visualize this distribution in a couple of ways. The first is to show a probability density function of the distribution but to scale the probability logarithmically. This is shown below. PDF of sample copula distribution The second is to simulate 10,000 years worth of experience and to place a dot for each year showing claim prevalence and mean scaled claim size. That is done below. I’ve annotated the graphic with labels showing what might represent a year in which there was a tornado outbreak, a catastrophic hurricane, a tropical storm as well as the large cluster of points representing years in which there was minimal storm damage. Claim prevalence and mean scaled claim size for 10,000 year simulation Equipped with our copula, we can now generate losses at the individual policyholder level for any given year. The idea is to create a “parameter mixture distribution” using the copula. As it turns out, one component of this parameter mixture distribution is itself a mixture distribution. Dear reader, you now have a choice. If you like details, have a little bit of a mathematical background and want to understand better how this model works, just keep reading at “A Mini-Course on Mixture and Parameter Mixture Distributions.” If you just want the big picture, skip to “Simulating at the Policyholder Level” below. A Mini-Course on Mixture and Parameter Mixture Distributions To fully understand this model, we need some understanding of a mixture distribution and a parameter mixture distribution. Let’s start with the mixture distribution, since that is easier. Imagine a distribution in which you first randomly determine which underlying component distribution you are going to use and then you take a draw from the selected underlying component distribution. You might, for example, roll a conventional six-sided die, which is a physical representation of what statisticians call a “discrete uniform distribution.” If the die came up 5 or 6, you then draw from a beta distribution with a mean of 0.7 and a standard deviation of 0.3 times the maximum. But if the die came up 1 through 4, you would draw from a uniform distribution on the interval [0,0.1]. The diagram below shows the probability density function of the resulting mixture distribution (in red) and the underlying components in blue. Mixture Distribution with beta and uniform components The mixture distribution has a finite number of underlying component distributions and has discrete weights that you select. The parameter mixture distribution can handle both infinite underlying component distributions and handles weights that are themselves draws from a statistical distribution. Suppose we create a continuous function $f$ that takes a parameter $x$ and creates triangular distribution which has a mean of $x$ and extends 1/4 in each direction from the mean. We will call this triangular distribution the underlying distribution of the parameter mixture distribution. The particular member of the triangular distribution family used is determined by the value of the parameter. And, now, we want to create a “meta distribution” — a parameter mixture distribution — in which the probability of drawing a particular parameter $x$ and in turn getting that kind of triangular distribution with mean $x$ is itself determined by another distribution, which I will call $w$. The distribution $w$ is the weighting distribution of the parameter mixture distribution. To make this concrete, suppose $w$ is a uniform distribution on the interval [0,1]. The diagram below shows the result. The blue triangular underlying distributions represent a sample of the probability density functions of triangular distributions. There are actually an infinite number of these triangular distributions, but obviously I can’t draw them all here. Notice that some of the density functions are more opaque than others. The opacity of each probability density function is based on the probability that such a distribution would be drawn from $w$. The red line shows the probability density function of the resulting parameter mixture distribution. It is kind of an envelope of these triangular distributions. Parameter mixture distribution for triangular distributions where mean of triangular distributions is drawn from a uniform distribution We can combine mixture distributions and parameter mixture distributions. We can have a mixture distribution in which one or more of the underlying functions is a parameter mixture distribution. And, we can have a parameter mixture distribution in which either the underlying function and/or the weighting function is a mixture distribution. It’s that combination — a parameter mixture distribution in which the underlying function is a mixture distribution — that we’re going to need to get a good simulation of the damages caused by storms. The weighting distribution of this parameter mixture distribution is our copula. It throws out two parameters: (1) $\nu$, the likelihood that in any given year the policyholder has a non-zero claim and (2) $\zeta$ the mean scaled claim size assuming that the policyholder has a non-zero claim. Those two parameters are going to weight members of the underlying distribution, which is a mixture distribution. The weights of the mixture distribution are the the likelihood that the policyholder has no claim and the likelihood that the policyholder has a non-zero claim (claim prevalence). The component distributions of the mixture distribution are (1) a distribution that always produces zero and (2) any distribution satisfying the constraint that its mean is equal to the mean scaled claim size. I’m going to use another beta distribution for this latter purpose with a standard deviation equal to 0.2 of the maximum standard deviation. I’ll denote this distribution as B. Some examination of data from Hurricane Ike is not inconsistent with the use of this distribution and the distribution has the virtue of being analytically tractable and relatively easy to compute. This diagram may help understand what is going on. The idea behind the parameter mixture distribution Simulating at the Policyholder Level So, we can now simulate a large insurance pool over the course of years by making, say, 10,000 draws from our copula. And from each draw of the copula, we can determine the claim size for each of the policyholders insured in that sample year. Here’s an example. Suppose our copula produces a year with some serious damage: claim prevalence value of 0.03 and a mean scaled claim size of 0.1 for the year. If we simulate the fate of 250,000 policyholders, we find that 242,500 have no claim. The graphic below shows the distribution of scaled claim sizes among those who did have a non-zero claim. Scaled claim sizes for sample year Fortunately, however, we don’t need to sample 250,000 policy holders each year for 10,000 years to get a good picture of what is going on. We can simulate things quite nicely by looking at the condition of just 2,500 policyholders and then just multiplying aggregate losses by 100. The graphic below shows a logarithmic plot of aggregate losses assuming a total insured value in the pool of$71 billion (which is about what TWIA has had recently).

Aggregate losses (simulated) on $71 billion of insured property We can also show a classical “exceedance curve” for our model. The graphic below varies the aggregate losses on$71 billion of insured property and shows, for each value, the probability (on a logarithmic scale) that losses would exceed that amount.  One can thus get a sense of the damage caused by the 100 year storm and the 1000-year storm.  The figures don’t perfectly match TWIA’s internal models, but that’s simply because our parameters have not been tweaked at this point to accomplish that goal.

Exceedance curve (logarithmic) for sample 10,000 year run

The final step is to model how extra precautions by a policyholder might alter these losses.  Presumably, precautions are like most economic things: there is a diminishing marginal return on investment.  So, I can roughly model matters by saying that for a precaution of x the insured results in the insured drawing from a new beta distribution with a mean equal to $\ell \times 2^{-x}$, where $\ell$ is the amount of damage they would have suffered had they taken no extra precautions. (I’ll keep the standard deviation of this beta distribution equal to 0.2 of its maximum possible value.) I have thus calibrated extra precautions such that each unit of extra precautions cuts the mean losses in half. It doesn’t mean that sometimes precautions won’t result in greater savings or that sometimes precautions won’t result in lesser savings; it just means that on average, each unit of precautions cuts the losses in half.

And, we’re done!  We’ve now got a storm model that when combined with the model of policyholder behavior that I will present in a future blog entry, should give us respectable predictions on the ability of insurance contract features such as deductibles and coinsurance to alter aggregate storm losses. Stay tuned!

Footnotes

[1] As I recognized a bit belatedly in this project, if one makes multiple draws from a copula distribution, it is not the case that the mean of the product of the two values $\nu$ and $\zeta$ drawn from the copula is equal to $\nu \times \zeta$. You can see why this might be by imagining a copula distribution in which the two values were perfectly correlated, in which case one would be drawing from a distribution transformed by squaring.  It is not the case that the mean of such a transformed distribution is equal to the mean of the underlying distribution.

[2] Copulas got a bad name over the past 10 years for bearing some responsibility for the financial crisis..  This infamy, however, has nothing to do with the mathematics of copulas, which remains quite brilliant, but with their abuse and the fact that incorrect distributions were inserted into the copula.

[3] We thus end up with a copula distribution whose probability density function takes on this rather ghastly closed form.  (It won’t be on the exam.)

$\frac{(1-\zeta )^{\frac{1-\mu _{\zeta }}{\kappa _{\zeta }^2}+\mu _{\zeta }-2} \zeta ^{\left(\frac{1}{\kappa _{\zeta }^2}-1\right) \mu _{\zeta }-1} (1-\nu )^{\frac{1-\mu _{\nu }}{\kappa _{\nu }^2}+\mu _{\nu }-2} \nu ^{\left(\frac{1}{\kappa _{\nu }^2}-1\right) \mu_{\nu }-1} \exp \left(\frac{\left(\text{erfc}^{-1}\left(2 I_{\zeta }\left(\left(\frac{1}{\kappa _{\zeta }^2}-1\right) \mu _{\zeta },\frac{\left(\kappa _{\zeta }^2-1\right) \left(\mu _{\zeta }-1\right)}{\kappa _{\zeta }^2}\right)\right)-\rho \text{erfc}^{-1}\left(2 I_{\nu }\left(\left(\frac{1}{\kappa _{\nu }^2}-1\right) \mu _{\nu },\frac{\left(\kappa _{\nu }^2-1\right) \left(\mu _{\nu }-1\right)}{\kappa _{\nu }^2}\right)\right)\right){}^2}{\rho ^2-1}+\text{erfc}^{-1}\left(2 I_{\zeta }\left(\left(\frac{1}{\kappa _{\zeta }^2}-1\right) \mu _{\zeta },\frac{\left(\kappa _{\zeta }^2-1\right) \left(\mu _{\zeta }-1\right)}{\kappa _{\zeta }^2}\right)\right){}^2\right)}{\sqrt{1-\rho ^2} B\left(\left(\frac{1}{\kappa _{\zeta }^2}-1\right) \mu _{\zeta },\frac{\left(\kappa _{\zeta }^2-1\right) \left(\mu _{\zeta }-1\right)}{\kappa _{\zeta }^2}\right) B\left(\left(\frac{1}{\kappa _{\nu }^2}-1\right) \mu _{\nu },\frac{\left(\kappa _{\nu }^2-1\right) \left(\mu _{\nu }-1\right)}{\kappa _{\nu }^2}\right)}$

# One way not to promote sanity

An opinion column in the Corpus Christi Caller by Nick Jiminez uses the issue of whether Insurance Commissioner should be an elected position in Texas as a vehicle for repeating bogus arguments about hurricane insurance in Texas.  Now, I don’t really have a stance on how that position should be filled — we seem to vote for an awful lot of offices here in Texas — but I do know that the points addressed in support of Mr. Jiminez’s position don’t make much sense. And I do believe that repetition of bogus arguments and this form of “messaging” is not a constructive way of addressing the serious problems facing the Texas coast.

I will list several of Mr. Jiminez’s arguments in turn and attempt to debunk them.

1. The Insurance Commissioner is unaccountable as evidenced by her inability to answer a question posed by State Rep. Todd Hunter at a hearing last month.  The question was how much the 14 coastal counties contribute to the Texas economy.  The correct answer was apparently 30% according to a study. But the question asked does not have a single “correct” answer.  To be sure, the coastal counties contribute immensely to the Texas economy, but there is not a single number that reflects this point. Moreover, I am willing to wager with Mr. Jiminez that if one were to have used the methodology employed by this study to come up with the 30% number, one would have found that the area around Dallas contributes a large percent, and similarly the area around Houston, and Austin and the Panhandle, etc. such that the total “contribution” would add up to well over 100%.  The problem, I suspect, is not with Commissioner Kitzman (or the other officials stumped by Rep. Hunter at the hearing) but with a question, that unless made far more precise, is objectionable.  Moreover, even if this Insurance Commissioner fumbled on this occasion and didn’t seek clarification of an ambiguous question posed by a good lawyer, this is hardly an argument for changing the political system.  Do you think that many of our elected officials would be able to respond on the spot to similar vague “statistical” questions?  I don’t.  Do you think that Commissioner Kitzman is unaware of the large contribution made by the coast to the Texas economy?  I don’t think so either.

Note.  I don’t begrudge Rep. Hunter making a thinly disguised argument in a legislative hearing.  I do begrudge those who would use the failure to answer an objectionable question “correctly” as a good reason to change our political system or to criticize the incumbent.

2.  Tropical cyclone insurance rates should be lower in Corpus Christi because it has not had a hurricane in 40 years.  This argument is wrong in so many ways.  First, hurricane risk is not tropical cyclone risk.  The area with 60 miles or Corpus Christi has been hit or brushed by tropical cyclones 34 times in the 140 years since records have been kept.  It gets hit by hurricanes on average once every 15 years.  TWIA and other cyclone insurers pay for high winds and named storms, not just hurricanes. As anyone who remembers Allison can say with confidence, tropical storms can be incredibly expensive events.  You can’t just ignore them. Second, the fact that Corpus Christi has been fortunate in recent years is little more likely to predict future performance well than the fact that the Astros had won four in a row on May 25, 2012, and were almost at .500.   Although it may be legitimate to claim that Corpus Christi appears to be less at risk for hurricanes than other parts of the Texas coast, it is not legitimate to cherry pick time periods and measure risk on that basis.

3. “If South Texas were a person buying car insurance, we would be getting a price break, not a huge bill as we are now.”  I won’t dispute that the bill is large, but the real issue is whether the bill is large relative to the risk.  If it were, Mr. Jiminez must explain why it is that private insurers are not beating down the door to write windstorm insurance in Nueces County.  Some vast conspiracy to not make money?  Moreover, if coastal politicians truly bought this argument, they must explain why they oppose TWIA basing its rates on geography rather than the essentially uniform rates that currently exist.

4. “Electing a commissioner would allow the poor and low-income voters, who often can’t afford steep windstorm rates, to have a say in who sets insurance rates.”  This point has some merit, but I have serious doubts it would help the Texas coast.  A lot of the poor and low-income voters about whom Mr. Jiminez appears concerned do not live on the coast.  They are currently subsidizing coastal residents — many of whom have houses far more valuable than theirs and owned by people who are considerably more wealthy — by letting  rich and poor on the coast alike purchase coverage at rates that do not reflect actuarial reality.  And the more expensive the house, the greater that subsidy. It is those poor about whom Mr. Jiminez claims concern who will end up paying parts of the assessments and surcharges to pay for claims suffered by rich and poor TWIA policyholders.  So, I’m not so sure the poor of El Paso and Dallas and, yes, Amarillo, will be eager in an election to vote for the candidate who pledges to continue the sort of subsidies for the coast that now exist.

All of that might explain why, in the end, Mr. Jiminez kills his own straw man  — is there a serious push to make the position elective?  He concedes that “[t]he real focus of an effort to bring some sanity to coastal insurance rates ought to be the next Texas Legislature, not fighting to get the insurance commissioner on the ballot.”  On this point, I probably agree, although I guess I wonder why one would then embark on a long rhetorical journey so hostile to the current Commissioner.  But sanity will not be made more likely by use of coastal newspapers to advance arguments that, no matter how frequently repeated, just do not hold water.