Category Archives: Probability

Ruin Theory

You walk into a casino with a certain amount of surplus money.  Every hour you spend in the casino, there is a certain probability of winning or losing some given amount of money.  A ruin theory question might ask, what is the probability that you go bankrupt in x number of hours.  The bankruptcy state is an absorbing state, so once you enter into that state, the probability of leaving that state is 0.  The solution to these types of problems usually require calculating an exhaustive list of probabilities for ruin by the convolution method.  But first, familiarity with the notation should be developed.


  1. \psi(u) — Starting with surplus u, this is the probability of ruin as t goes to infinity with t defined on \mathbb{R}.
  2. \bar{\psi}(u) —  Same as above except t \in \mathbb{N}_+, positive integers.
  3. \psi(u,t) — Probability of ruin from time 0 to time t with t \in \mathbb{R}.
  4. \bar\psi(u,t) — Same as above except t \in \mathbb{N}_+.
The analogous survival probabilities follow the same conventions except they are denoted by \phi.  
The following relationships are useful:
  1. \psi(u) \geq \psi(u,t) \geq \bar\psi(u,t)
  2. \psi(u) \geq \bar\psi(u) \geq \bar\psi(u,t)
  3. \psi(u) \geq \psi(u+k) for k \geq 0
  4. \phi(u) \leq \phi(u,t) \leq \bar\phi(u,t)
  5. \phi(u) \leq \bar\phi(u) \leq \bar\phi(u,t)
  6. \phi(u) \leq \phi(u+k) for k \geq 0

Leave a comment

Filed under Probability, Ruin Theory

Recursive Discrete Aggregate Loss

You have 2 six-sided dice.  You roll one dice to determine the number of times you will roll the second dice.  The sum of the results of each roll of the second dice is the amount of aggregate loss.  Since the frequency and severity are discrete, for any aggregate loss amount, the number of combinations of rolls to produce such an amount is clearly countable and finite.  For example, an aggregate loss amount of 3 can be arrived at by rolling a 1 on the first dice, then rolling a 3; or rolling a 2, then rolling the combinations (1,2),(2,1); or rolling a 3 and then rolling (1,1,1) on the second dice.  The probability of experiencing an aggregate loss of 3 is:

\begin{array}{rll} \Pr(S=3) \displaystyle &=& \frac{1}{6^2} + \frac{2}{6^3} + \frac{1}{6^4} \\ \\ \displaystyle &=& \frac{49}{6^4} \end{array}

This method of calculating the probability is called the convolution method.  Now imagine the frequency and severity distributions are discrete but infinite.  To calculate \Pr(S=10) would require calculating the probability for many possible combinations.  If the discrete functions are from the (a,b,0) class, there is a recursive formula that can calculate this.  It is given by:

g_k = \displaystyle \frac{1}{1-af_0}\sum_{j=1}^k \left(a+\frac{bj}{k}\right)f_jg_{k-j}

where k is an integer, g_k = \Pr(S=n)=f_S(n), f_n = \Pr(X=n), and p_n = \Pr(N=n).  This is called the recursive method.  To start the recursion, you need to find g_0.  You can then find any g_k.  If a problem asks for F_S(3), this is equal to g_0+g_1+g_2+g_3.  You iterate through the recursion to find each g_k then add them together.

Leave a comment

Filed under Aggregate Models, Frequency Models, Probability, Severity Models

Approximating Aggregate Losses

An aggregate loss S is the sum of all losses in a certain period of time.  There are an unknown number N of losses that may occur and each loss is an unknown amount X.  N is called the frequency random variable and X is called the severity.  This situation can be modeled using a compound distribution of N and X.  The model is specified by:

\displaystyle S = \sum_{n=1}^N X_n

where N is the random variable for frequency and the X_n‘s are IID random variables for severity.  This type of structure is called a collective risk model.

An alternative way to model aggregate loss is to model each risk using a different distribution appropriate to that risk.  For example, in a portfolio of risks, one may be modeled using a pareto distribution and another may be modeled with an exponential distribution.  The expected aggregate loss would be the sum of the individual expected losses.  This is called an individual risk model and is given by: 

\displaystyle S = \sum_{i=1}^n X_i

where n is the number of individual risks in the portfolio and the X_i‘s are random variables for the individual losses.  The X_i‘s are NOT IID, and n is known.

Both of these models are tested in the exam; however, the individual risk model is usually tested in combination with the collective risk model.  An example of a problem structure that combines the two is given below.

Example 1: Your company sells car insurance policies.  The in-force policies are categorized into high-risk and low-risk groups.  In the high-risk group, the number of claims in a year is poisson with a mean of 30.  The number of claims for the low-risk group is poisson with a mean of 10.  The amount of each claim is pareto distributed with \theta = 200 and \alpha = 2.
Analysis: Being able to see the structure of the problem is a very important first step in being able to solve it.  In this situation, you would model the aggregate loss as an individual risk model.  There are 2 individual risks– high and low risk.  For each group, you would model the aggregate loss using a collective risk model.  For the high-risk, the frequency is poisson with mean 30 and the severity is pareto with \theta = 200 and \alpha = 2.  For the low-risk group, the frequency is poisson with mean 10 and the severity is pareto with the same parameters.

For these problems, you will need to know how to:

  1. Find the expected aggregate loss.
  2. Find the variance of aggregate loss.
  3. Approximate the probability that the aggregate loss will be above or below a certain amount using a normal distribution.  
    Example: what is the probability that aggregate losses are below $5,000?
  4. Determine how many risks would need to be in a portfolio for the probability of aggregate loss to reach a given level of certainty for a given amount.
    Example: how many policies should you underwrite so that the aggregate loss is less than the expected aggregate loss with a 95% degree of certainty? 
  5. Determine how long your risk exposure should be for the probability of aggregate loss to reach a given level of certainty for a given amount.

Problems that require you to determine probabilities for the aggregate loss will usually state that you should use a normal approximation.  This will require the calculation of the expected aggregate loss and the variance of the aggregate loss.

Expected aggregate loss for a collective risk model is given by:

E[S] = E[N]E[X]

For the individual risk model, it is

\displaystyle E[S] = \sum_{i=1}^n E[X_i]

Variances under the collective risk model are conditional variances.

Var(S) = E[Var(X|I)] + Var(E[X|I])

When frequency and severity are independent, the following shortcut is valid and is called a compound variance:

Var(S) = E[N]Var(X) + Var(N)E[X]^2

Variance under the individual risk model is additive:

\displaystyle Var(S) = \sum_{i=1}^n Var(X)

Example 2: Continuing from Example 1, calculate the mean and variance of the aggregate loss.  Assume frequency and severity are independent.
Answer: This is done by

  1. Calculating the expected aggregate loss and variance in the high-risk group.
  2. Calculating the expected aggregate loss and variance in the low-risk group.
  3. Adding the expected values from both groups to get the total expected aggregate loss.
  4. Adding the variances from both groups to get the total variance.

I will use subscript H and L to denote high and low risk groups respectively.

E[S_H] = E[N_H]E[X_H] = 30\times 200 = 6,000

\begin{array}{rll} Var(S_H) &=& E[N_H]Var(X_H) + Var(N_H)E[X_H]^2 \\ &=& 30 \times 40,000 + 30 \times 200^2 \\ &=& 2,400,000 \end{array}

E[S_L] = E[N_L]E[X_L] = 10 \times 200 = 2,000

\begin{array}{rll} Var(S_H) &=& 10 \times 40,000 + 10 \times 200^2 \\ &=& 800,000 \end{array}

Add expected values to get

E[S] = 6,000 + 2,000 = 8,000

Add variances to get

Var(S) = 2,400,000 + 800,000 = 3,200,000

Once the mean and variance of the aggregate loss has been calculated, you can use them to approximate probabilities for aggregate losses using a normal distribution.

Example 3: Continuing from Example 2, use a normal approximation for aggregate loss to calculate the probability that losses exceed $12,000.
Answer:  To solve this, you will need to calculate a z value for the normal distribution using the expected value and variance found in Example 2.

\begin{array}{rll} \Pr(S > 12,000) &=& 1- \Pr(S< 12,000) \\ \\ &=& \displaystyle 1-\Phi\left(\frac{12,000 - 8,000}{\sqrt{3,200,000}}\right) \\ \\ &=& 1 - \Phi(2.24) \\ \\ &=& 0.0125 \end{array}

Suppose in the above examples the severity X is discrete.  For example, X is poisson.  Under this specification, we need to add 0.5 to 12,000 in the calculation for \Pr(S > 12,000).  So we would instead calculate \Pr(S > 12,000.5)  This is called a continuity correction and occurs when we have a discrete severity random variable.  If we were interested in \Pr(S<12,000), we would subtract 0.5 instead.  This has a greater effect when the domain of possible values is smaller.

Another type of problem I’ve encountered in the samples is constructed as follows:

Example 4: You drive a 1992 Honda Prelude Si piece-of-crap-mobile (no, that’s my old car and you are driving it because I sold it to you to buy my Mercedes).  The failure rate per year is poisson with mean 2.  The average cost of repair for each instance of breakdown is $500 with a standard deviation of $1000.  How many years do you have to continue driving the car so that the probability of the total maintenance cost exceeding 120% of the expected total maintenance cost is less than 10%?  (Assume the car is so crappy that it cannot deteriorate any further so the failure rates and average repair costs remain constant every year.)
Answer:  For one year,

E[S_1] = 1,000

\begin{array}{rll} Var(S_1) &=& 2 \times 1,000^2 + 2 \times 500^2 \\ &=& 2,500,000 \end{array}

For n years, we have

E[S] = 1,000n

Var(S) = 2,500,000n

According to the problem, we are interested in S such that \Pr(S > 1,200n) = 0.1.  Under normal approximation, this implies

\begin{array}{rll} \Pr(S>1,200n) &=& 1-\Pr(S<1,200n) \\ \\ &=& \displaystyle 1- \Phi\left(\frac{1,200n - 1,000n}{\sqrt{2,500,000n}}\right) \end{array}

Which implies

\displaystyle \Phi\left(\frac{200n}{\sqrt{2,500,000n}}\right) = 0.9

The probability 0.9 corresponds to a z value of 1.28.  This implies

\displaystyle \frac{200n}{\sqrt{2,500,000n}} = 1.28

Solving for n we have n = 1024 years.  LOL!

1 Comment

Filed under Aggregate Models, Frequency Models, Probability, Severity Models

Frequency Models

Frequency models count the number of times an event occurs.

  1. The number of customers to arrive each hour.
  2. The number of coins lucky Tom finds on his way home from school.
  3. How many scientists a Tyrannosaur eats on a certain day.
  4. Etc.
This is in contrast to a severity model which measures the magnitude of an event.
  1. How much a customer spends.
  2. The value of a coin that lucky Tom finds.
  3. The number of calories each scientist provides.
  4. Etc.
The following distributions are used to model event frequency.  For notation, p_n means Pr(N=n).


\begin{array}{lr}\displaystyle p_n = e^{-\lambda} \frac{\lambda^n}{n!} & \lambda > 0 \end{array}
  1. Parameter is \lambda.
  2. Mean is \lambda.
  3. Variance is \lambda.
  4. If N_1, N_2, ..., N_i are Poisson with parameters \lambda_1, \lambda_2, ..., \lambda_i, then N = N_1 + N_2 + ... + N_i is Poisson with parameter \lambda = \lambda_1 + \lambda_2 + ... + \lambda_i.

Negative Binomial:

\begin{array}{lr} \displaystyle p_n = {{n+r-1}\choose{n}}\left(\frac{1}{1+\beta}\right)^r\left(\frac{\beta}{1+\beta}\right)^n & \beta>0, r>0 \end{array}
  1. Parameters are r and \beta.
  2. Mean is r\beta.
  3. Variance is r\beta\left(1+\beta\right).
  4. Variance is always greater than the mean.
  5. Is equal to a Geometric distribution when r=1.
  6. If N_1, N_2, ..., N_i are negative binomial with parameters \beta_1 = \beta_2 = ... = \beta_i and r_1, r_2, ..., r_i, then the sum N = N_1 + N_2 + ... + N_i is negative binomial and has parameters \beta = \beta_1 and r = r_1+r_2+...+r_i.  Note: \beta‘s must be the same.


\begin{array}{lr} \displaystyle p_n = \frac{\beta^n}{\left(1+\beta\right)^{n+1}} & \beta>0 \end{array}
  1. Parameter is \beta.
  2. Mean is \beta.
  3. Variance is \beta\left(1+\beta\right).
  4. If N_1, N_2, ..., N_i are geometric with parameter \beta, then the sum N = N_1+N_2+...+N_i is negative binomial with parameters \beta and r = i.


\displaystyle p_n = {{m} \choose {n}}q^n\left(1-q\right)^{m-n}
where m is a positive integer, 0<q<1.
  1. Parameters are m and q.
  2. Mean is mq.
  3. Variance is mq\left(1-q\right).
  4. Variance is always less than mean.
  5. If N_1, N_2, ..., N_i is binomial with parameters q and m_1, m_2, ..., m_i, then the sum N=N_1+N_2+...+N_i is binomial with parameters q and m = m_1+m_2+...+m_i.

The (a,b,0) recursion:

These distributions can be reparameterized into a recursive formula with parameters a and b.  When reparameterized, they all have the same recursive format.
\displaystyle p_k = \left(a+ \frac{b}{k}\right)p_{k-1}
It is more common to write
\displaystyle \frac{p_k}{p_k-1} = a+\frac{b}{k}
The parameters a and b are different for each distribution.
  1. Poisson:
    a = 0 and b =\lambda.
  2. Negative Binomial:
    \displaystyle a = \frac{\beta}{1+\beta} and \displaystyle b = \left(r-1\right)\frac{\beta}{1+\beta}.
  3. Geometric:
    \displaystyle a = \frac{\beta}{1+\beta} and \displaystyle b = 0.
  4. Binomial:
    \displaystyle a = -\frac{q}{1-q} and \displaystyle b = \left(m+1\right)\frac{q}{1-q}.
Pop Quiz!
  1. A frequency distribution has a = 0.8 and b = 1.2.  What distribution is this?
    Answer: Negative Binomial because both parameters are positive. 
  2. A frequency distribution has mean 1 and variance 0.5.  What distribution is this?
    Answer: Binomial because the variance is less than the mean. 

Leave a comment

Filed under Frequency Models, Probability

The Lognormal Distribution

Review: If X is normal with mean \mu and standard deviation \sigma, then

Z = \displaystyle \frac{X-\mu}{\sigma}

is the Standard Normal Distribution with mean 0 and standard deviation 1.  To find the probability Pr(X \le x), you would convert X to the standard normal distribution and look up the values in the standard normal table.

\begin{array}{rll} Pr(X \le x) &=& Pr\left(\displaystyle \frac{X-\mu}{\sigma} \le \frac{x-\mu}{\sigma}\right) \\ \\ &=& \displaystyle Pr\left(Z \le \frac{x-\mu}{\sigma}\right) \\ \\ &=& \displaystyle \mathcal{N}\left(\frac{x-\mu}{\sigma}\right) \end{array}

If V is a weighted sum of n normal random variables X_i, i = 1, ..., n, with means \mu_i, variance \sigma^2_i, and weights w_i, then

\displaystyle E\left[\sum_{i=1}^n w_iX_i\right] = \sum_{i=1}^n w_i\mu_i

and variance

\displaystyle Var\left(\sum_{i=1}^n w_iX_i\right) = \sum_{i=1}^n \sum_{j=1}^n w_iw_j\sigma_{ij}

where \sigma_{ij} is the covariance between X_i and X_j.  Note when i=j, \sigma_{ij} = \sigma_i^2 = \sigma_j^2.

Remember: A sum of random variables is not the same as a mixture distribution!  The expected value is the same, but the variance is not.  A sum of normal random variables is also normal.  So V is normal with the above mean and variance.

Actuary Speak: This is called a stable distribution.  The sum of random variables from the same distribution family produces a random variable that is also from the same distribution family.

The fun stuff:
If X is normal, then Y = e^X is lognormal.  If X has mean \mu and standard deviation \sigma, then

\begin{array}{rll} \displaystyle E\left[Y\right] &=& E\left[e^X\right] \\ \\ \displaystyle &=& e^{\mu + \frac{1}{2}\sigma^2} \\ \\ Var\left(e^X\right) &=& e^{2\mu + \sigma^2}\left(e^{\sigma^2} - 1\right)\end{array}

Recall FV = e^\delta where FV is the future value of an investment growing at a continuously compounded rate of \delta for one period.  If the rate of growth is a normal distributed random variable, then the future value is lognormal.  The Black-Scholes model for option prices assumes stocks appreciate at a continuously compounded rate that is normally distributed.

S_t = S_0e^{R(0,t)}

where S_t is the stock price at time t, S_0 is the current price, and R(0,t) is the random variable for the rate of return from time 0 to t.  Now consider the situation where R(0,t) is the sum of iid normal random variables R(0,h) + R(h,2h) + ... + R((n-1)h,t) each having mean \mu_h and variance \sigma_h^2.  Then

\begin{array}{rll} E\left[R(0,t)\right] &=& n\mu_h \\ Var\left(R(0,t)\right) &=& n\sigma_h^2 \end{array}

If h represents 1 year, this says that the expected return in 10 years is 10 times the one year return and the standard deviation is \sqrt{10} times the annual standard deviation.  This allows us to formulate a function for the mean and standard deviation with respect to time.  Suppose we write

\begin{array}{rll} \displaystyle \mu(t) &=& \left(\alpha - \delta -\frac{1}{2}\sigma^2\right)t \\ \sigma(t) &=& \sigma \sqrt{t} \end{array}

where \alpha is the growth factor and \delta is the continuous rate of dividend payout.  Since all normal random variables are transformations of the standard normal, we can write R(0,t) =\mu(t)+Z\sigma(t) . The model for the stock price becomes

\displaystyle S_t = S_0e^{\left(\alpha - \delta - \frac{1}{2}\sigma^2\right)t + Z\sigma\sqrt{t}}

In this model, the expected value of the stock price at time t is

E\left[S_t\right] = S_0e^{(\alpha - \delta)t}

Actuary Speak: The standard deviation \sigma of the return rate is called the volatility of the stock.  This term comes from expressing the rate of return as an Ito process. \mu(t) is called the drift term and \sigma(t) is called the volatility term.

Confidence intervals: To find the range of stock prices that corresponds to a particular confidence interval, we need only look at the confidence interval on the standard normal distribution then translate that interval into stock prices using the equation for S_t.

Example: For example z=[-1.96, 1.96] represents the 95% confidence interval in the standard normal \mathcal{N}(z).  Suppose t = \frac{1}{3}, \alpha = 0.15, \delta = 0.01, \sigma = 0.3, and S_0 = 40.  Then the 95% confidence interval for S_t is

\left[40e^{(0.15-0.01-\frac{1}{2}0.3^2)\frac{1}{3} + (-1.96)0.3\sqrt{\frac{1}{3}}},40e^{(0.15-0.01-\frac{1}{2}0.3^2)\frac{1}{3} + (1.96)0.3\sqrt{\frac{1}{3}}}\right]

Which corresponds to the price interval of


Probabilities: Probability calculations on stock prices require a bit more mental gymnastics.

\begin{array}{rll} \displaystyle Pr\left(S_t<K\right) &=& Pr\left(\frac{S_t}{S_0} < \frac{K}{S_0}\right) \\ \\ \displaystyle &=& Pr\left(\ln{\frac{S_t}{S_0}} < \ln{\frac{K}{S_0}}\right) \\ \\ \displaystyle &=& Pr\left(Z< \frac{\ln{\frac{K}{S_0}} - \mu(t)}{\sigma(t)}\right) \\ \\ \displaystyle &=& Pr\left(Z<\frac{\ln{\frac{K}{S_0}} - \left(\alpha - \delta - \frac{1}{2}\sigma^2\right)t}{\sigma\sqrt{t}}\right) \end{array}

Conditional Expected Value: Define

\begin{array}{rll} \displaystyle d_1 &=& -\frac{\ln{\frac{K}{S_0}} - \left(\alpha - \delta + \frac{1}{2}\sigma^2\right)t}{\sigma\sqrt{t}} \\ \\ \displaystyle d_2 &=& -\frac{\ln{\frac{K}{S_0}}- \left(\alpha - \delta - \frac{1}{2}\sigma^2\right)t}{\sigma\sqrt{t}} \end{array}


\begin{array}{rll} \displaystyle E\left[S_t|S_t<K\right] &=& S_0e^{(\alpha - \delta)t}\frac{\mathcal{N}(-d_1)}{\mathcal{N}(-d_2)} \\ \\ \displaystyle E\left[S_t|S_t>K\right] &=& S_0e^{(\alpha - \delta)t}\frac{\mathcal{N}(d_1)}{\mathcal{N}(d_2)} \end{array}

This gives the expected stock price at time t given that it is less than K or greater than K respectively.

Black-Scholes formula: A call option C_t on stock S_t has value \max\left(0,S_t - K\right) at time t.  The option pays out if S_t > K.  So the value of this option at time 0 is the probability that it pays out at time t, discounted by the risk free interest rate r, and multiplied by the expected value of S_t - K given that S_t > K.  In other words,

\begin{array}{rll} \displaystyle C_0 &=& e^{-rt}Pr\left(S_t>K\right)E\left[S_t-K|S_t>K\right] \\ \\ &=& e^{-rt}\mathcal{N}(d_2)\left(E\left[S_t|S_t>K\right] - E\left[K|S_t>K\right]\right) \\ \\ &=& e^{-rt}\mathcal{N}(d_2)\left(S_0e^{(\alpha - \delta)t}\frac{\mathcal{N}(d_1)}{\mathcal{N}(d_2)} - K\right) \end{array}

Black-Scholes makes the additional assumption that all investors are risk neutral.  This means assets do not pay a risk premium for being more risky.  Long story short, \alpha - r = 0 so \alpha = r.  So in the Black-Scholes formula:

\begin{array}{rll} \displaystyle d_1 &=& -\frac{\ln{\frac{K}{S_0}} - \left(r - \delta + \frac{1}{2}\sigma^2\right)t}{\sigma\sqrt{t}} \\ \\ \displaystyle d_2 &=& -\frac{\ln{\frac{K}{S_0}}- \left(r- \delta - \frac{1}{2}\sigma^2\right)t}{\sigma\sqrt{t}} \end{array}

Continuing our derivation of C_0 but replacing \alpha with r,

\begin{array}{rll} \displaystyle C_0 &=& e^{-rt}\mathcal{N}(d_2)\left(S_0e^{(r - \delta)t}\frac{\mathcal{N}(d_1)}{\mathcal{N}(d_2)} - K\right) \\ \\ &=& S_0e^{-\delta t}\mathcal{N}(d_1) - Ke^{-rt}\mathcal{N}(d_2)\end{array}

For a put option P_0 with payout K-S_t for K>S_t and 0 otherwise,

P_0 = Ke^{-rt}\mathcal{N}(-d_2) - S_0e^{-\delta t}\mathcal{N}(-d_1)

These are the famous Black-Scholes formulas for option pricing.  When derived on the back of a cocktail napkin, they are indispensable for impressing the ladies at your local bar.  :p

Leave a comment

Filed under Parametric Models, Probability

Parametric Distributions

Parametric distributions are functions in several dimensions.  Various parametric distributions are given in the exam tables.  Each input variable or dimension of the distribution function is called a parameter.  While studying, it is important to keep in mind that parameters are simply abstract devices built into a distribution function which allow us, through their manipulation, to tweak the shape of the distribution.  Ultimately, we are still only interested in things like Pr(X\le x) and the distribution function parameters are used to help describe the distribution of X.


  1. Scaling:  If a random variable X has a scaleable parametric distribution with parameters (a_1, a_2, ..., a_n, \theta), then one of these parameters can be called a scale parameter and is denoted by \theta.  Having the scaleable property implies that cX can be described with the same distribution function as X, except that the parameters of its distribution are (a_1,a_2,..., a_n,c\theta) where c is the scale factor.  In terms of probability, scaling a random variable has the following effect– if Y = cX with c >0, then Pr(Y \le y) = Pr(cX\le y) = Pr(X \le \frac{y}{c}).
    Caveat: The Inverse Gaussian as given in the exam tables has a \theta in its set of parameters; however, this is not a scale distribution.  To scale a Lognormal distribution, adjust the parameters to (\mu + \ln{c}, \sigma) where c is the scale factor and \mu and \sigma are the usual parameters.  All the rest of the distributions given in appendix A are scale distributions.
  2. Raising to a power:  A random variable raised to a positive power is called transformed.  If it is raised to -1 it is called inverse. If it is raised to a power less than -1, it is called inverse transformed.  When raising to a power, the scale parameter needs to be readjusted to remain a scale parameter in the new distribution.
  3. Exponentiating:  An example is the lognormal distribution.  If X is normal, then Y = e^X is lognormal.  In terms of probability, F_Y(y) = F_X(\ln{y}).
You can create a new distribution function by defining different distribution probability densities on different domain intervals.  As long as the piecewise integral of the spliced distribution is 1, it is a valid distribution.  Since total probability has to be exactly 1, scaling is an important tool that allows us to do this.
Tail Weight
Since a density function must integrate to 1, it must tend to 0 at the extremities of its domain.  If density function A tends towards zero at a slower rate than density function B, then density A is said to have a heavier tail than density B.  Some important measures of tail weight:
  1. Tail weight decreases inversely with respect to the number of positive raw or central moments that exist.
  2. The limit of the ratio of one density or survival function over another may tend to zero or infinity depending on which has the greater tail weight.
  3. An increasing hazard rate function implies a lighter tail and vice versa.
  4. An increasing mean residual life function means a heavier tail and vice versa.

Leave a comment

Filed under Parametric Models, Probability

Expected Values for Insurance

Before I begin, please note: I hated this chapter.  If there are any errors please let me know asap!

A deductible d is an amount that is subtracted from an insurance claim.  If you have a $500 deductible on your car insurance, your insurance company will only pay damages incurred beyond $500.  We are interested in the following random variables: (X - d)_+ and (X\wedge d).


  1. Payment per Loss: (X-d)_+ = \left\{ \begin{array}{ll} X-d &\mbox{ if } X>d \\ 0 &\mbox{ otherwise} \end{array} \right.
  2. Limited Payment per Loss:  (X\wedge d) = \left\{ \begin{array}{ll} d &\mbox{ if } X>d \\ X &\mbox{ if } 0<X<d \\ 0 &\mbox{ otherwise} \end{array} \right.
Expected Values:
  1. \begin{array}{rll} E[(X-d)_+] &=& \displaystyle \int_{d}^{\infty}{(x-d)f(x)dx} \\ \\ &=& \displaystyle \int_{d}^{\infty}{S(x)dx} \end{array}
  2. \begin{array}{rll} E[(X\wedge d)] &=& \displaystyle \int_{0}^{d}{xf(x)dx +dS(x)} \\ \\ &=& \displaystyle \int_{0}^{d}{S(x)dx} \end{array}
We may also be interested in the payment per loss, given payment is incurred (payment per payment) X-d|X>d.
By definition:
E[X-d|X>d] = \displaystyle \frac{E[(X-d)_+]}{P(X>d)}
Since actuaries like to make things more complicated than they really are, we have special names for this expected value.  It is denoted by e_X(d) and is called mean excess loss in P&C insurance and \displaystyle {\mathop{e}\limits^{\circ}}_d is called mean residual life in life insurance.  Weishaus simplifies the notation by using the P&C notation without the random variable subscript.  I’ll use the same.
  1. For an exponential distribution,
    e(d) = \theta
  2. For a Pareto distribution,
    e(d) = \displaystyle \frac{\theta +d}{\alpha - 1}
  3. For a single parameter Pareto distribution,
    e(d) = \displaystyle \frac{d}{\alpha - 1}
Useful Relationships:
  1. \begin{array}{rll} E[X] &=& E[X\wedge d] + E[(X-d)_+] \\ &=& E[X\wedge d] + e(d)[1-F(d)] \end{array}
Actuary Speak (important for problem comprehension):
  1. The random variable (X-d)_+ is said to be shifted by d and censored.
  2. e(d) is called mean excess loss or mean residual life.
  3. The random variable X\wedge d can be called limited expected value, payment per loss with claims limit, and amount not paid due to deductible.  d can be called a claims limit or deductible depending on how it is used in the problem.
  4. If data is given for X with observed values and number of observations or probabilities, the data is called the empirical distribution.  Sometimes empirical distributions may be given for a problem, but you are still asked to assume an parametric distribution for X.

Leave a comment

Filed under Coverage Modifications, Deductibles, Limits, Probability, Severity Models