The Poisson Gamma Mixture Pattern

Suppose a random variable N has a frequency distribution that is Poisson with parameter \lambda. Suppose the parameter \lambda is also a random variable and it has a gamma distribution with parameters \alpha and \theta. Then N is equivalent to a negative binomial with parameters r = \alpha and \beta = \theta.

Note that

  1. When \alpha =1, the gamma distribution is equivalent to an exponential distribution.
  2. This also means the negative binomial has parameter r=1 which is equivalent to a geometric distribution.
Pop Quiz!
You own a space mining company and have sent several exploration bots to scout possible mineral rich asteroids.  Each bot discovers pockets of valuable resources on different asteroids at a rate of \lambda per year.  The parameter \lambda varies by bot according to an exponential distribution with parameter \theta = 3.
  1. What is the expected number of discoveries per year for a bot chosen at random?
    Answer:  3
  2. What is the variance?
    Answer:  12

Leave a comment

Filed under Frequency Models

Frequency Models

Frequency models count the number of times an event occurs.

  1. The number of customers to arrive each hour.
  2. The number of coins lucky Tom finds on his way home from school.
  3. How many scientists a Tyrannosaur eats on a certain day.
  4. Etc.
This is in contrast to a severity model which measures the magnitude of an event.
  1. How much a customer spends.
  2. The value of a coin that lucky Tom finds.
  3. The number of calories each scientist provides.
  4. Etc.
The following distributions are used to model event frequency.  For notation, p_n means Pr(N=n).


\begin{array}{lr}\displaystyle p_n = e^{-\lambda} \frac{\lambda^n}{n!} & \lambda > 0 \end{array}
  1. Parameter is \lambda.
  2. Mean is \lambda.
  3. Variance is \lambda.
  4. If N_1, N_2, ..., N_i are Poisson with parameters \lambda_1, \lambda_2, ..., \lambda_i, then N = N_1 + N_2 + ... + N_i is Poisson with parameter \lambda = \lambda_1 + \lambda_2 + ... + \lambda_i.

Negative Binomial:

\begin{array}{lr} \displaystyle p_n = {{n+r-1}\choose{n}}\left(\frac{1}{1+\beta}\right)^r\left(\frac{\beta}{1+\beta}\right)^n & \beta>0, r>0 \end{array}
  1. Parameters are r and \beta.
  2. Mean is r\beta.
  3. Variance is r\beta\left(1+\beta\right).
  4. Variance is always greater than the mean.
  5. Is equal to a Geometric distribution when r=1.
  6. If N_1, N_2, ..., N_i are negative binomial with parameters \beta_1 = \beta_2 = ... = \beta_i and r_1, r_2, ..., r_i, then the sum N = N_1 + N_2 + ... + N_i is negative binomial and has parameters \beta = \beta_1 and r = r_1+r_2+...+r_i.  Note: \beta‘s must be the same.


\begin{array}{lr} \displaystyle p_n = \frac{\beta^n}{\left(1+\beta\right)^{n+1}} & \beta>0 \end{array}
  1. Parameter is \beta.
  2. Mean is \beta.
  3. Variance is \beta\left(1+\beta\right).
  4. If N_1, N_2, ..., N_i are geometric with parameter \beta, then the sum N = N_1+N_2+...+N_i is negative binomial with parameters \beta and r = i.


\displaystyle p_n = {{m} \choose {n}}q^n\left(1-q\right)^{m-n}
where m is a positive integer, 0<q<1.
  1. Parameters are m and q.
  2. Mean is mq.
  3. Variance is mq\left(1-q\right).
  4. Variance is always less than mean.
  5. If N_1, N_2, ..., N_i is binomial with parameters q and m_1, m_2, ..., m_i, then the sum N=N_1+N_2+...+N_i is binomial with parameters q and m = m_1+m_2+...+m_i.

The (a,b,0) recursion:

These distributions can be reparameterized into a recursive formula with parameters a and b.  When reparameterized, they all have the same recursive format.
\displaystyle p_k = \left(a+ \frac{b}{k}\right)p_{k-1}
It is more common to write
\displaystyle \frac{p_k}{p_k-1} = a+\frac{b}{k}
The parameters a and b are different for each distribution.
  1. Poisson:
    a = 0 and b =\lambda.
  2. Negative Binomial:
    \displaystyle a = \frac{\beta}{1+\beta} and \displaystyle b = \left(r-1\right)\frac{\beta}{1+\beta}.
  3. Geometric:
    \displaystyle a = \frac{\beta}{1+\beta} and \displaystyle b = 0.
  4. Binomial:
    \displaystyle a = -\frac{q}{1-q} and \displaystyle b = \left(m+1\right)\frac{q}{1-q}.
Pop Quiz!
  1. A frequency distribution has a = 0.8 and b = 1.2.  What distribution is this?
    Answer: Negative Binomial because both parameters are positive. 
  2. A frequency distribution has mean 1 and variance 0.5.  What distribution is this?
    Answer: Binomial because the variance is less than the mean. 

Leave a comment

Filed under Frequency Models, Probability

Bonuses, Dividends, and Refunds

If a policy pays a reward to the participants when losses are below a certain level, this is a particular type of  problem which Weishaus calls a “bonus” problem.  The bonus, dividend, or refund amount is expressed as a maximum between 0 and the refunded amount.  For example, a 15% refund is paid on the difference between the $100 premium and the loss L.  No refund is paid if losses exceed $100.  The refund amount R can be expressed as

R = 0.15 \max (0, 100-L)

The key to finding the expected refund is knowing how to manipulate the max function and rewrite it as a min.  We can rewrite as

\begin{array}{rll} R &=& 0.15 \max (100-100,100-L) \\ &=& 0.15(100-\min (100,L)) \end{array}

So the expected value is given by

E[R] = 0.15(100 - E[L \wedge 100])

Leave a comment

Filed under Coverage Modifications

Other Coverage Modifications

Coinsurance \alpha is the fraction of losses covered by the policy.  For example, \alpha = 0.8 means if a loss is incurred, 80% will be paid by the insurance company.  A claims limit u is the maximum amount that will be paid.  The order in which coinsurance, claims limits, and deductibles is applied to a loss is important and will be specified by the problem.  The expected payment per loss when all three are present in a policy is given by

E\left[Y\right] = \alpha \left[E\left[X\wedge u\right] - E\left[X \wedge d\right]\right]

where Y is the payment variable and X is the original loss variable.  The second moment is given by

E\left[Y^2\right] = \alpha^2\left(E\left[(X\wedge u)^2\right] - E\left[(X \wedge d)^2\right]-2d\left(E\left[X \wedge u\right]-E\left[X \wedge d\right]\right)\right)

The second moment can be used to find the variance of payment per loss.  If inflation r is present, multiply the second moment by (1+r)^2 and divide u and d by (1+r).   For payment per payments, divide the expected values by P(X>d) or 1-F(d).

Leave a comment

Filed under Coinsurance, Coverage Modifications, Deductibles, Limits

The Loss Elimination Ratio

If you impose a deductible d on an insurance policy that you’ve written, what fraction of expected losses do you eliminate from your expected liability?  This is measured by the Loss Elimination Ratio LER(d).

\displaystyle LER(d) = \frac{E\left[X \wedge d\right]}{E\left[X\right]}


  1. Ordinary deductible d— The payment made by the writer of the policy is the loss X minus the deductible d.  If the loss is less than d, then nothing is paid.
  2. Franchise deductible d_f—  The payment made by the writer of the policy is the complete amount of the loss X if X is greater than d_f.
A common type of question considers what happens to LER if an inflation rate r increases the amount of all losses, but the deductible remains unadjusted.  Let X be the loss variable.  Then Y=(1+r)X is the inflation adjusted loss variable.  If losses Y are subject to deductible d, then
\begin{array}{rll} \displaystyle LER_Y(d) &=& \frac{E\left[(1+r)X\wedge d\right]}{E\left[(1+r)X\right]} \\ \\ \displaystyle &=&\frac{(1+r)E\left[X\wedge \frac{d}{1+r}\right]}{(1+r)E\left[X\right]} \\ \\ &=& \frac{E\left[X \wedge \frac{d}{1+r}\right]}{E\left[X\right]}\end{array}
\displaystyle E\left[X \wedge d\right] = \int_0^d{x f(x) dx} + d\left(1-F(x)\right)


Filed under Coverage Modifications, Deductibles

The Lognormal Distribution

Review: If X is normal with mean \mu and standard deviation \sigma, then

Z = \displaystyle \frac{X-\mu}{\sigma}

is the Standard Normal Distribution with mean 0 and standard deviation 1.  To find the probability Pr(X \le x), you would convert X to the standard normal distribution and look up the values in the standard normal table.

\begin{array}{rll} Pr(X \le x) &=& Pr\left(\displaystyle \frac{X-\mu}{\sigma} \le \frac{x-\mu}{\sigma}\right) \\ \\ &=& \displaystyle Pr\left(Z \le \frac{x-\mu}{\sigma}\right) \\ \\ &=& \displaystyle \mathcal{N}\left(\frac{x-\mu}{\sigma}\right) \end{array}

If V is a weighted sum of n normal random variables X_i, i = 1, ..., n, with means \mu_i, variance \sigma^2_i, and weights w_i, then

\displaystyle E\left[\sum_{i=1}^n w_iX_i\right] = \sum_{i=1}^n w_i\mu_i

and variance

\displaystyle Var\left(\sum_{i=1}^n w_iX_i\right) = \sum_{i=1}^n \sum_{j=1}^n w_iw_j\sigma_{ij}

where \sigma_{ij} is the covariance between X_i and X_j.  Note when i=j, \sigma_{ij} = \sigma_i^2 = \sigma_j^2.

Remember: A sum of random variables is not the same as a mixture distribution!  The expected value is the same, but the variance is not.  A sum of normal random variables is also normal.  So V is normal with the above mean and variance.

Actuary Speak: This is called a stable distribution.  The sum of random variables from the same distribution family produces a random variable that is also from the same distribution family.

The fun stuff:
If X is normal, then Y = e^X is lognormal.  If X has mean \mu and standard deviation \sigma, then

\begin{array}{rll} \displaystyle E\left[Y\right] &=& E\left[e^X\right] \\ \\ \displaystyle &=& e^{\mu + \frac{1}{2}\sigma^2} \\ \\ Var\left(e^X\right) &=& e^{2\mu + \sigma^2}\left(e^{\sigma^2} - 1\right)\end{array}

Recall FV = e^\delta where FV is the future value of an investment growing at a continuously compounded rate of \delta for one period.  If the rate of growth is a normal distributed random variable, then the future value is lognormal.  The Black-Scholes model for option prices assumes stocks appreciate at a continuously compounded rate that is normally distributed.

S_t = S_0e^{R(0,t)}

where S_t is the stock price at time t, S_0 is the current price, and R(0,t) is the random variable for the rate of return from time 0 to t.  Now consider the situation where R(0,t) is the sum of iid normal random variables R(0,h) + R(h,2h) + ... + R((n-1)h,t) each having mean \mu_h and variance \sigma_h^2.  Then

\begin{array}{rll} E\left[R(0,t)\right] &=& n\mu_h \\ Var\left(R(0,t)\right) &=& n\sigma_h^2 \end{array}

If h represents 1 year, this says that the expected return in 10 years is 10 times the one year return and the standard deviation is \sqrt{10} times the annual standard deviation.  This allows us to formulate a function for the mean and standard deviation with respect to time.  Suppose we write

\begin{array}{rll} \displaystyle \mu(t) &=& \left(\alpha - \delta -\frac{1}{2}\sigma^2\right)t \\ \sigma(t) &=& \sigma \sqrt{t} \end{array}

where \alpha is the growth factor and \delta is the continuous rate of dividend payout.  Since all normal random variables are transformations of the standard normal, we can write R(0,t) =\mu(t)+Z\sigma(t) . The model for the stock price becomes

\displaystyle S_t = S_0e^{\left(\alpha - \delta - \frac{1}{2}\sigma^2\right)t + Z\sigma\sqrt{t}}

In this model, the expected value of the stock price at time t is

E\left[S_t\right] = S_0e^{(\alpha - \delta)t}

Actuary Speak: The standard deviation \sigma of the return rate is called the volatility of the stock.  This term comes from expressing the rate of return as an Ito process. \mu(t) is called the drift term and \sigma(t) is called the volatility term.

Confidence intervals: To find the range of stock prices that corresponds to a particular confidence interval, we need only look at the confidence interval on the standard normal distribution then translate that interval into stock prices using the equation for S_t.

Example: For example z=[-1.96, 1.96] represents the 95% confidence interval in the standard normal \mathcal{N}(z).  Suppose t = \frac{1}{3}, \alpha = 0.15, \delta = 0.01, \sigma = 0.3, and S_0 = 40.  Then the 95% confidence interval for S_t is

\left[40e^{(0.15-0.01-\frac{1}{2}0.3^2)\frac{1}{3} + (-1.96)0.3\sqrt{\frac{1}{3}}},40e^{(0.15-0.01-\frac{1}{2}0.3^2)\frac{1}{3} + (1.96)0.3\sqrt{\frac{1}{3}}}\right]

Which corresponds to the price interval of


Probabilities: Probability calculations on stock prices require a bit more mental gymnastics.

\begin{array}{rll} \displaystyle Pr\left(S_t<K\right) &=& Pr\left(\frac{S_t}{S_0} < \frac{K}{S_0}\right) \\ \\ \displaystyle &=& Pr\left(\ln{\frac{S_t}{S_0}} < \ln{\frac{K}{S_0}}\right) \\ \\ \displaystyle &=& Pr\left(Z< \frac{\ln{\frac{K}{S_0}} - \mu(t)}{\sigma(t)}\right) \\ \\ \displaystyle &=& Pr\left(Z<\frac{\ln{\frac{K}{S_0}} - \left(\alpha - \delta - \frac{1}{2}\sigma^2\right)t}{\sigma\sqrt{t}}\right) \end{array}

Conditional Expected Value: Define

\begin{array}{rll} \displaystyle d_1 &=& -\frac{\ln{\frac{K}{S_0}} - \left(\alpha - \delta + \frac{1}{2}\sigma^2\right)t}{\sigma\sqrt{t}} \\ \\ \displaystyle d_2 &=& -\frac{\ln{\frac{K}{S_0}}- \left(\alpha - \delta - \frac{1}{2}\sigma^2\right)t}{\sigma\sqrt{t}} \end{array}


\begin{array}{rll} \displaystyle E\left[S_t|S_t<K\right] &=& S_0e^{(\alpha - \delta)t}\frac{\mathcal{N}(-d_1)}{\mathcal{N}(-d_2)} \\ \\ \displaystyle E\left[S_t|S_t>K\right] &=& S_0e^{(\alpha - \delta)t}\frac{\mathcal{N}(d_1)}{\mathcal{N}(d_2)} \end{array}

This gives the expected stock price at time t given that it is less than K or greater than K respectively.

Black-Scholes formula: A call option C_t on stock S_t has value \max\left(0,S_t - K\right) at time t.  The option pays out if S_t > K.  So the value of this option at time 0 is the probability that it pays out at time t, discounted by the risk free interest rate r, and multiplied by the expected value of S_t - K given that S_t > K.  In other words,

\begin{array}{rll} \displaystyle C_0 &=& e^{-rt}Pr\left(S_t>K\right)E\left[S_t-K|S_t>K\right] \\ \\ &=& e^{-rt}\mathcal{N}(d_2)\left(E\left[S_t|S_t>K\right] - E\left[K|S_t>K\right]\right) \\ \\ &=& e^{-rt}\mathcal{N}(d_2)\left(S_0e^{(\alpha - \delta)t}\frac{\mathcal{N}(d_1)}{\mathcal{N}(d_2)} - K\right) \end{array}

Black-Scholes makes the additional assumption that all investors are risk neutral.  This means assets do not pay a risk premium for being more risky.  Long story short, \alpha - r = 0 so \alpha = r.  So in the Black-Scholes formula:

\begin{array}{rll} \displaystyle d_1 &=& -\frac{\ln{\frac{K}{S_0}} - \left(r - \delta + \frac{1}{2}\sigma^2\right)t}{\sigma\sqrt{t}} \\ \\ \displaystyle d_2 &=& -\frac{\ln{\frac{K}{S_0}}- \left(r- \delta - \frac{1}{2}\sigma^2\right)t}{\sigma\sqrt{t}} \end{array}

Continuing our derivation of C_0 but replacing \alpha with r,

\begin{array}{rll} \displaystyle C_0 &=& e^{-rt}\mathcal{N}(d_2)\left(S_0e^{(r - \delta)t}\frac{\mathcal{N}(d_1)}{\mathcal{N}(d_2)} - K\right) \\ \\ &=& S_0e^{-\delta t}\mathcal{N}(d_1) - Ke^{-rt}\mathcal{N}(d_2)\end{array}

For a put option P_0 with payout K-S_t for K>S_t and 0 otherwise,

P_0 = Ke^{-rt}\mathcal{N}(-d_2) - S_0e^{-\delta t}\mathcal{N}(-d_1)

These are the famous Black-Scholes formulas for option pricing.  When derived on the back of a cocktail napkin, they are indispensable for impressing the ladies at your local bar.  :p

Leave a comment

Filed under Parametric Models, Probability

Parametric Distributions

Parametric distributions are functions in several dimensions.  Various parametric distributions are given in the exam tables.  Each input variable or dimension of the distribution function is called a parameter.  While studying, it is important to keep in mind that parameters are simply abstract devices built into a distribution function which allow us, through their manipulation, to tweak the shape of the distribution.  Ultimately, we are still only interested in things like Pr(X\le x) and the distribution function parameters are used to help describe the distribution of X.


  1. Scaling:  If a random variable X has a scaleable parametric distribution with parameters (a_1, a_2, ..., a_n, \theta), then one of these parameters can be called a scale parameter and is denoted by \theta.  Having the scaleable property implies that cX can be described with the same distribution function as X, except that the parameters of its distribution are (a_1,a_2,..., a_n,c\theta) where c is the scale factor.  In terms of probability, scaling a random variable has the following effect– if Y = cX with c >0, then Pr(Y \le y) = Pr(cX\le y) = Pr(X \le \frac{y}{c}).
    Caveat: The Inverse Gaussian as given in the exam tables has a \theta in its set of parameters; however, this is not a scale distribution.  To scale a Lognormal distribution, adjust the parameters to (\mu + \ln{c}, \sigma) where c is the scale factor and \mu and \sigma are the usual parameters.  All the rest of the distributions given in appendix A are scale distributions.
  2. Raising to a power:  A random variable raised to a positive power is called transformed.  If it is raised to -1 it is called inverse. If it is raised to a power less than -1, it is called inverse transformed.  When raising to a power, the scale parameter needs to be readjusted to remain a scale parameter in the new distribution.
  3. Exponentiating:  An example is the lognormal distribution.  If X is normal, then Y = e^X is lognormal.  In terms of probability, F_Y(y) = F_X(\ln{y}).
You can create a new distribution function by defining different distribution probability densities on different domain intervals.  As long as the piecewise integral of the spliced distribution is 1, it is a valid distribution.  Since total probability has to be exactly 1, scaling is an important tool that allows us to do this.
Tail Weight
Since a density function must integrate to 1, it must tend to 0 at the extremities of its domain.  If density function A tends towards zero at a slower rate than density function B, then density A is said to have a heavier tail than density B.  Some important measures of tail weight:
  1. Tail weight decreases inversely with respect to the number of positive raw or central moments that exist.
  2. The limit of the ratio of one density or survival function over another may tend to zero or infinity depending on which has the greater tail weight.
  3. An increasing hazard rate function implies a lighter tail and vice versa.
  4. An increasing mean residual life function means a heavier tail and vice versa.

Leave a comment

Filed under Parametric Models, Probability