Statistics for Empirical Models

An empirical model makes an assumption about the type of distribution underlying a set of data.  The data is then used to estimate the parameters in the assumed underlying distribution.  For example, if it is assumed that the underlying distribution is a uniform distribution, the data may be used to estimate the maximum value that the random variable can have.  If the assumed underlying distribution is Poisson, the data may be used to estimate the rate parameter \lambda.  There are three measures of quality for these estimators– bias, consistency, and mean squared error.


  1. Let \hat\theta be an estimator and \theta be the true parameter being estimated.  The bias is defined by: bias_{\hat\theta}(\theta) = E\left[\hat\theta|\theta\right] - \theta.  An estimator is said to be unbiased if bias_{\hat\theta}(\theta) = 0.
  2. An estimator is said to be consistent if for any \delta > 0, \lim_{n \rightarrow \infty} \Pr\left(|\hat\theta_n - \theta| < \delta\right) = 1, where \hat\theta_n is the estimator based on n observations.
  3. The Mean Squared Error is MSE_{\hat\theta}(\theta) = E\left[(\hat\theta - \theta)^2|\theta\right].  A low mean squared error is desirable.

The following relationship is useful:

MSE_{\hat\theta}(\theta) = Var(\hat\theta) + [bias_{\hat\theta}(\theta)]^2

To determine a construct a confidence interval for an estimator, you simply add and subtract z\hat\sigma to the estimate where z is the appropriate z-value from the standard normal table and \hat\sigma is the square root of the variance of the estimator.  So a 95% confidence interval for an estimator \hat\theta would be

\left(\hat\theta - 1.96\sqrt{\hat{var}(\theta)},\hat\theta + 1.96\sqrt{\hat{var}(\theta)}\right)

You can usually express the variance of the estimator in terms of the true parameter.  In that case you can substitute the estimated variance with the true variance of the estimator based on your assumption of the underlying distribution.

When the variance of an estimator can be expressed in terms of the true parameter that you are trying to estimate, a more accurate confidence interval can be derived by

\displaystyle -z \le \frac{\hat\theta - \theta}{\sqrt{\hat var(\theta)}} \le z

Solving for \theta gives the interval.

An important point to keep in mind is that the results you observe in the data is the outcome of a random variable.  Since estimators are calculated based on these observations, they are a function of the results of a random variable.  If you make some assumptions about the underlying distribution, you can calculate the statistics of an estimator, such as the variance of an estimator.  For example, in a population of 100, you observe 5 claims.  You assume the underlying distribution for the number of claims per person is poisson and you estimate its parameter to be \hat\lambda = \frac {5}{100}.  This means your equation for the estimator is:

\displaystyle \hat\lambda = \frac{1}{100}\sum_{i=1}^{100} X_i

where X_i represents the true underlying distribution for each person.  Thus the variance of your estimator is given by

\begin{array}{rll} var(\hat\lambda) &=& \displaystyle var\left(\frac{1}{100}\sum_{i=1}^{100} X_i\right) \\ \\ &=& \displaystyle \frac{1}{100^2} var\left(\sum_{i=1}^{100} X_i\right) \\ \\ &=& \displaystyle \frac{1}{100} var(X) \end{array}

Since you’ve assumed the underlying distribution to be poisson,

\hat var(\hat\lambda) = \displaystyle \frac{\lambda}{100}

where \lambda is the variance of X.  This is how you can arrive at equations for the statistics of estimators based on an assumed underlying distribution.  The key is to realize that there is a link between the estimator and the true underlying distribution.


Leave a comment

Filed under Empirical Models

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s