# Category Archives: Empirical Models

## Maximum Likelihood Estimators

With Maximum Likelihood Estimator (MLE) problems, a parametric distribution is named, one or more of it’s parameter values are unknown, a set of data from this distribution is given, and you’re asked to find the parameter values that maximize the probability of observing the data.  You do this by brute force.  For each data point, you express the probability of observing it, which is simply the density function of whatever named distribution was given in the problem.  The probability of observing the whole set of data is simply the product of the probability of each data point.  If there are lots of data points, you end up with a massive function.

For example, the density function for $N$ is given by:

$f_N(n;\lambda) = \lambda e^{-\lambda n}$

in which $\lambda$ is the unknown parameter.  You are also given a set of observations $a, b, c$ and you must find the value of $\lambda$ which maximizes the probability of seeing these particular values.  The likelihood of seeing these values is given by $L(\lambda)$:

$L(\lambda) = \left( \lambda e^{-\lambda a}\right)\left( \lambda e^{-\lambda b}\right)\left( \lambda e^{-\lambda c}\right) = \lambda^3 e^{-\lambda (a+b+c)}$

To find $\lambda$ which maximizes this function, you take the derivative of it, set it equal to 0, and solve for $\lambda$.  If you’re looking a few steps ahead, you should realize that doing the maximization by brute force will be difficult.  To get around this, we can log the likelihood function then find the maximum.  This does not change the resulting value of $\lambda$.  The log likelihood $l(\lambda)$ is then:

$l(\lambda) = 3\ln(\lambda) -\lambda(a+b+c)$

The derivative with respect to $\lambda$ is

$\displaystyle \frac{\delta}{\delta \lambda}l(\lambda) = \frac{3}{\lambda} - (a+b+c)$

Equating to 0 and solving, we have

$\lambda = \displaystyle \frac{3}{a+b+c}$

Filed under Empirical Models

## Kaplan-Meier and Nelson-Aalen Estimators

When the empirical data is incomplete (truncated or censored), raw empirical estimators will not produce good results.  In this scenario, there are two techniques available to determine the distribution function based on the data.  The Kaplan-Meier product limit estimator can be used to generate a survival distribution function.  The Nelson-Aalen estimator can be used to generate a cumulative hazard rate function.  The Kaplan-Meier estimator is given by:

$S_n(t) = \displaystyle \prod_{i=1}^{j-1} \left(1-\frac{s_i}{r_i}\right), \quad y_{j-1} \le t < y_j$

where $r_i$ is the risk set at time $y_i$ and $s_i$ is the number of observations from the random event whose distribution you are trying to estimate.  For example, if the random event you are interested in is death, then $r_1$ could be the number of life insurance policy holders immediately prior to the first death, and $s_i$ would be the number of observed deaths at the time of the first death (you can have simultaneous deaths).  The key to dealing with problems that use this estimator is to understand how $r_i$ changes with respect to censoring or truncation.  If a person withdraws from the life insurance policy, this decreases $r_i$ but this is not a death, so it does not contribute to $s_i$.  If new members join at time $y_i$, they are not part of the risk set until time $y_{i+1}$.

If the data is censored past a certain point, you can assume an exponential distribution for the censored portion.  Suppose observations past $c$ are censored.  If you know the value of $S_n(c)$ you can solve for $\theta$ using $S_n(c) = e^{-c/\theta}$.

The Nelson-Aalen cumulative hazard rate estimator is given by:

$\tilde H(t) = \displaystyle \sum_{i-1}^{j-1} \frac{s_i}{r_i}, \quad y_{j-1} \le t < y_j$

You can use this to get a survival function:

$\tilde S(t) = e^{-\tilde H(t)}$

## Statistics for Empirical Models

An empirical model makes an assumption about the type of distribution underlying a set of data.  The data is then used to estimate the parameters in the assumed underlying distribution.  For example, if it is assumed that the underlying distribution is a uniform distribution, the data may be used to estimate the maximum value that the random variable can have.  If the assumed underlying distribution is Poisson, the data may be used to estimate the rate parameter $\lambda$.  There are three measures of quality for these estimators– bias, consistency, and mean squared error.

Definitions:

1. Let $\hat\theta$ be an estimator and $\theta$ be the true parameter being estimated.  The bias is defined by: $bias_{\hat\theta}(\theta) = E\left[\hat\theta|\theta\right] - \theta$.  An estimator is said to be unbiased if $bias_{\hat\theta}(\theta) = 0$.
2. An estimator is said to be consistent if for any $\delta > 0$, $\lim_{n \rightarrow \infty} \Pr\left(|\hat\theta_n - \theta| < \delta\right) = 1$, where $\hat\theta_n$ is the estimator based on $n$ observations.
3. The Mean Squared Error is $MSE_{\hat\theta}(\theta) = E\left[(\hat\theta - \theta)^2|\theta\right]$.  A low mean squared error is desirable.

The following relationship is useful:

$MSE_{\hat\theta}(\theta) = Var(\hat\theta) + [bias_{\hat\theta}(\theta)]^2$

CONFIDENCE INTERVALS
To determine a construct a confidence interval for an estimator, you simply add and subtract $z\hat\sigma$ to the estimate where $z$ is the appropriate z-value from the standard normal table and $\hat\sigma$ is the square root of the variance of the estimator.  So a 95% confidence interval for an estimator $\hat\theta$ would be

$\left(\hat\theta - 1.96\sqrt{\hat{var}(\theta)},\hat\theta + 1.96\sqrt{\hat{var}(\theta)}\right)$

You can usually express the variance of the estimator in terms of the true parameter.  In that case you can substitute the estimated variance with the true variance of the estimator based on your assumption of the underlying distribution.

When the variance of an estimator can be expressed in terms of the true parameter that you are trying to estimate, a more accurate confidence interval can be derived by

$\displaystyle -z \le \frac{\hat\theta - \theta}{\sqrt{\hat var(\theta)}} \le z$

Solving for $\theta$ gives the interval.

STATISTICS OF ESTIMATORS
An important point to keep in mind is that the results you observe in the data is the outcome of a random variable.  Since estimators are calculated based on these observations, they are a function of the results of a random variable.  If you make some assumptions about the underlying distribution, you can calculate the statistics of an estimator, such as the variance of an estimator.  For example, in a population of 100, you observe 5 claims.  You assume the underlying distribution for the number of claims per person is poisson and you estimate its parameter to be $\hat\lambda = \frac {5}{100}$.  This means your equation for the estimator is:

$\displaystyle \hat\lambda = \frac{1}{100}\sum_{i=1}^{100} X_i$

where $X_i$ represents the true underlying distribution for each person.  Thus the variance of your estimator is given by

$\begin{array}{rll} var(\hat\lambda) &=& \displaystyle var\left(\frac{1}{100}\sum_{i=1}^{100} X_i\right) \\ \\ &=& \displaystyle \frac{1}{100^2} var\left(\sum_{i=1}^{100} X_i\right) \\ \\ &=& \displaystyle \frac{1}{100} var(X) \end{array}$

Since you’ve assumed the underlying distribution to be poisson,

$\hat var(\hat\lambda) = \displaystyle \frac{\lambda}{100}$

where $\lambda$ is the variance of $X$.  This is how you can arrive at equations for the statistics of estimators based on an assumed underlying distribution.  The key is to realize that there is a link between the estimator and the true underlying distribution.