# Tag Archives: Censored

## Kaplan-Meier and Nelson-Aalen Estimators

When the empirical data is incomplete (truncated or censored), raw empirical estimators will not produce good results.  In this scenario, there are two techniques available to determine the distribution function based on the data.  The Kaplan-Meier product limit estimator can be used to generate a survival distribution function.  The Nelson-Aalen estimator can be used to generate a cumulative hazard rate function.  The Kaplan-Meier estimator is given by:

$S_n(t) = \displaystyle \prod_{i=1}^{j-1} \left(1-\frac{s_i}{r_i}\right), \quad y_{j-1} \le t < y_j$

where $r_i$ is the risk set at time $y_i$ and $s_i$ is the number of observations from the random event whose distribution you are trying to estimate.  For example, if the random event you are interested in is death, then $r_1$ could be the number of life insurance policy holders immediately prior to the first death, and $s_i$ would be the number of observed deaths at the time of the first death (you can have simultaneous deaths).  The key to dealing with problems that use this estimator is to understand how $r_i$ changes with respect to censoring or truncation.  If a person withdraws from the life insurance policy, this decreases $r_i$ but this is not a death, so it does not contribute to $s_i$.  If new members join at time $y_i$, they are not part of the risk set until time $y_{i+1}$.

If the data is censored past a certain point, you can assume an exponential distribution for the censored portion.  Suppose observations past $c$ are censored.  If you know the value of $S_n(c)$ you can solve for $\theta$ using $S_n(c) = e^{-c/\theta}$.

The Nelson-Aalen cumulative hazard rate estimator is given by:

$\tilde H(t) = \displaystyle \sum_{i-1}^{j-1} \frac{s_i}{r_i}, \quad y_{j-1} \le t < y_j$

You can use this to get a survival function:

$\tilde S(t) = e^{-\tilde H(t)}$

## Expected Values for Insurance

Before I begin, please note: I hated this chapter.  If there are any errors please let me know asap!

A deductible $d$ is an amount that is subtracted from an insurance claim.  If you have a $500 deductible on your car insurance, your insurance company will only pay damages incurred beyond$500.  We are interested in the following random variables: $(X - d)_+$ and $(X\wedge d)$.

Definitions:

1. Payment per Loss: $(X-d)_+ = \left\{ \begin{array}{ll} X-d &\mbox{ if } X>d \\ 0 &\mbox{ otherwise} \end{array} \right.$
2. Limited Payment per Loss:  $(X\wedge d) = \left\{ \begin{array}{ll} d &\mbox{ if } X>d \\ X &\mbox{ if } 0
Expected Values:
1. $\begin{array}{rll} E[(X-d)_+] &=& \displaystyle \int_{d}^{\infty}{(x-d)f(x)dx} \\ \\ &=& \displaystyle \int_{d}^{\infty}{S(x)dx} \end{array}$

2. $\begin{array}{rll} E[(X\wedge d)] &=& \displaystyle \int_{0}^{d}{xf(x)dx +dS(x)} \\ \\ &=& \displaystyle \int_{0}^{d}{S(x)dx} \end{array}$
We may also be interested in the payment per loss, given payment is incurred (payment per payment) $X-d|X>d$.
By definition:
$E[X-d|X>d] = \displaystyle \frac{E[(X-d)_+]}{P(X>d)}$
Since actuaries like to make things more complicated than they really are, we have special names for this expected value.  It is denoted by $e_X(d)$ and is called mean excess loss in P&C insurance and $\displaystyle {\mathop{e}\limits^{\circ}}_d$ is called mean residual life in life insurance.  Weishaus simplifies the notation by using the P&C notation without the random variable subscript.  I’ll use the same.
Memorize!
1. For an exponential distribution,
$e(d) = \theta$
2. For a Pareto distribution,
$e(d) = \displaystyle \frac{\theta +d}{\alpha - 1}$
3. For a single parameter Pareto distribution,
$e(d) = \displaystyle \frac{d}{\alpha - 1}$
Useful Relationships:
1. $\begin{array}{rll} E[X] &=& E[X\wedge d] + E[(X-d)_+] \\ &=& E[X\wedge d] + e(d)[1-F(d)] \end{array}$
Actuary Speak (important for problem comprehension):
1. The random variable $(X-d)_+$ is said to be shifted by $d$ and censored.
2. $e(d)$ is called mean excess loss or mean residual life.
3. The random variable $X\wedge d$ can be called limited expected value, payment per loss with claims limit, and amount not paid due to deductible.  $d$ can be called a claims limit or deductible depending on how it is used in the problem.
4. If data is given for $X$ with observed values and number of observations or probabilities, the data is called the empirical distribution.  Sometimes empirical distributions may be given for a problem, but you are still asked to assume an parametric distribution for $X$.