# Tag Archives: Confidence Interval

## Statistics for Empirical Models

An empirical model makes an assumption about the type of distribution underlying a set of data.  The data is then used to estimate the parameters in the assumed underlying distribution.  For example, if it is assumed that the underlying distribution is a uniform distribution, the data may be used to estimate the maximum value that the random variable can have.  If the assumed underlying distribution is Poisson, the data may be used to estimate the rate parameter $\lambda$.  There are three measures of quality for these estimators– bias, consistency, and mean squared error.

Definitions:

1. Let $\hat\theta$ be an estimator and $\theta$ be the true parameter being estimated.  The bias is defined by: $bias_{\hat\theta}(\theta) = E\left[\hat\theta|\theta\right] - \theta$.  An estimator is said to be unbiased if $bias_{\hat\theta}(\theta) = 0$.
2. An estimator is said to be consistent if for any $\delta > 0$, $\lim_{n \rightarrow \infty} \Pr\left(|\hat\theta_n - \theta| < \delta\right) = 1$, where $\hat\theta_n$ is the estimator based on $n$ observations.
3. The Mean Squared Error is $MSE_{\hat\theta}(\theta) = E\left[(\hat\theta - \theta)^2|\theta\right]$.  A low mean squared error is desirable.

The following relationship is useful:

$MSE_{\hat\theta}(\theta) = Var(\hat\theta) + [bias_{\hat\theta}(\theta)]^2$

CONFIDENCE INTERVALS
To determine a construct a confidence interval for an estimator, you simply add and subtract $z\hat\sigma$ to the estimate where $z$ is the appropriate z-value from the standard normal table and $\hat\sigma$ is the square root of the variance of the estimator.  So a 95% confidence interval for an estimator $\hat\theta$ would be

$\left(\hat\theta - 1.96\sqrt{\hat{var}(\theta)},\hat\theta + 1.96\sqrt{\hat{var}(\theta)}\right)$

You can usually express the variance of the estimator in terms of the true parameter.  In that case you can substitute the estimated variance with the true variance of the estimator based on your assumption of the underlying distribution.

When the variance of an estimator can be expressed in terms of the true parameter that you are trying to estimate, a more accurate confidence interval can be derived by

$\displaystyle -z \le \frac{\hat\theta - \theta}{\sqrt{\hat var(\theta)}} \le z$

Solving for $\theta$ gives the interval.

STATISTICS OF ESTIMATORS
An important point to keep in mind is that the results you observe in the data is the outcome of a random variable.  Since estimators are calculated based on these observations, they are a function of the results of a random variable.  If you make some assumptions about the underlying distribution, you can calculate the statistics of an estimator, such as the variance of an estimator.  For example, in a population of 100, you observe 5 claims.  You assume the underlying distribution for the number of claims per person is poisson and you estimate its parameter to be $\hat\lambda = \frac {5}{100}$.  This means your equation for the estimator is:

$\displaystyle \hat\lambda = \frac{1}{100}\sum_{i=1}^{100} X_i$

where $X_i$ represents the true underlying distribution for each person.  Thus the variance of your estimator is given by

$\begin{array}{rll} var(\hat\lambda) &=& \displaystyle var\left(\frac{1}{100}\sum_{i=1}^{100} X_i\right) \\ \\ &=& \displaystyle \frac{1}{100^2} var\left(\sum_{i=1}^{100} X_i\right) \\ \\ &=& \displaystyle \frac{1}{100} var(X) \end{array}$

Since you’ve assumed the underlying distribution to be poisson,

$\hat var(\hat\lambda) = \displaystyle \frac{\lambda}{100}$

where $\lambda$ is the variance of $X$.  This is how you can arrive at equations for the statistics of estimators based on an assumed underlying distribution.  The key is to realize that there is a link between the estimator and the true underlying distribution.

Filed under Empirical Models

## The Lognormal Distribution

Review: If $X$ is normal with mean $\mu$ and standard deviation $\sigma$, then

$Z = \displaystyle \frac{X-\mu}{\sigma}$

is the Standard Normal Distribution with mean 0 and standard deviation 1.  To find the probability $Pr(X \le x)$, you would convert $X$ to the standard normal distribution and look up the values in the standard normal table.

$\begin{array}{rll} Pr(X \le x) &=& Pr\left(\displaystyle \frac{X-\mu}{\sigma} \le \frac{x-\mu}{\sigma}\right) \\ \\ &=& \displaystyle Pr\left(Z \le \frac{x-\mu}{\sigma}\right) \\ \\ &=& \displaystyle \mathcal{N}\left(\frac{x-\mu}{\sigma}\right) \end{array}$

If $V$ is a weighted sum of $n$ normal random variables $X_i, i = 1, ..., n$, with means $\mu_i$, variance $\sigma^2_i$, and weights $w_i$, then

$\displaystyle E\left[\sum_{i=1}^n w_iX_i\right] = \sum_{i=1}^n w_i\mu_i$

and variance

$\displaystyle Var\left(\sum_{i=1}^n w_iX_i\right) = \sum_{i=1}^n \sum_{j=1}^n w_iw_j\sigma_{ij}$

where $\sigma_{ij}$ is the covariance between $X_i$ and $X_j$.  Note when $i=j$, $\sigma_{ij} = \sigma_i^2 = \sigma_j^2$.

Remember: A sum of random variables is not the same as a mixture distribution!  The expected value is the same, but the variance is not.  A sum of normal random variables is also normal.  So $V$ is normal with the above mean and variance.

Actuary Speak: This is called a stable distribution.  The sum of random variables from the same distribution family produces a random variable that is also from the same distribution family.

The fun stuff:
If $X$ is normal, then $Y = e^X$ is lognormal.  If $X$ has mean $\mu$ and standard deviation $\sigma$, then

$\begin{array}{rll} \displaystyle E\left[Y\right] &=& E\left[e^X\right] \\ \\ \displaystyle &=& e^{\mu + \frac{1}{2}\sigma^2} \\ \\ Var\left(e^X\right) &=& e^{2\mu + \sigma^2}\left(e^{\sigma^2} - 1\right)\end{array}$

Recall $FV = e^\delta$ where $FV$ is the future value of an investment growing at a continuously compounded rate of $\delta$ for one period.  If the rate of growth is a normal distributed random variable, then the future value is lognormal.  The Black-Scholes model for option prices assumes stocks appreciate at a continuously compounded rate that is normally distributed.

$S_t = S_0e^{R(0,t)}$

where $S_t$ is the stock price at time $t$, $S_0$ is the current price, and $R(0,t)$ is the random variable for the rate of return from time 0 to t.  Now consider the situation where $R(0,t)$ is the sum of iid normal random variables $R(0,h) + R(h,2h) + ... + R((n-1)h,t)$ each having mean $\mu_h$ and variance $\sigma_h^2$.  Then

$\begin{array}{rll} E\left[R(0,t)\right] &=& n\mu_h \\ Var\left(R(0,t)\right) &=& n\sigma_h^2 \end{array}$

If $h$ represents 1 year, this says that the expected return in 10 years is 10 times the one year return and the standard deviation is $\sqrt{10}$ times the annual standard deviation.  This allows us to formulate a function for the mean and standard deviation with respect to time.  Suppose we write

$\begin{array}{rll} \displaystyle \mu(t) &=& \left(\alpha - \delta -\frac{1}{2}\sigma^2\right)t \\ \sigma(t) &=& \sigma \sqrt{t} \end{array}$

where $\alpha$ is the growth factor and $\delta$ is the continuous rate of dividend payout.  Since all normal random variables are transformations of the standard normal, we can write $R(0,t) =\mu(t)+Z\sigma(t)$ . The model for the stock price becomes

$\displaystyle S_t = S_0e^{\left(\alpha - \delta - \frac{1}{2}\sigma^2\right)t + Z\sigma\sqrt{t}}$

In this model, the expected value of the stock price at time $t$ is

$E\left[S_t\right] = S_0e^{(\alpha - \delta)t}$

Actuary Speak: The standard deviation $\sigma$ of the return rate is called the volatility of the stock.  This term comes from expressing the rate of return as an Ito process. $\mu(t)$ is called the drift term and $\sigma(t)$ is called the volatility term.

Confidence intervals: To find the range of stock prices that corresponds to a particular confidence interval, we need only look at the confidence interval on the standard normal distribution then translate that interval into stock prices using the equation for $S_t$.

Example: For example $z=[-1.96, 1.96]$ represents the 95% confidence interval in the standard normal $\mathcal{N}(z)$.  Suppose $t = \frac{1}{3}$, $\alpha = 0.15$, $\delta = 0.01$, $\sigma = 0.3$, and $S_0 = 40$.  Then the 95% confidence interval for $S_t$ is

$\left[40e^{(0.15-0.01-\frac{1}{2}0.3^2)\frac{1}{3} + (-1.96)0.3\sqrt{\frac{1}{3}}},40e^{(0.15-0.01-\frac{1}{2}0.3^2)\frac{1}{3} + (1.96)0.3\sqrt{\frac{1}{3}}}\right]$

Which corresponds to the price interval of

$\left[29.40,57.98\right]$

Probabilities: Probability calculations on stock prices require a bit more mental gymnastics.

$\begin{array}{rll} \displaystyle Pr\left(S_t

Conditional Expected Value: Define

$\begin{array}{rll} \displaystyle d_1 &=& -\frac{\ln{\frac{K}{S_0}} - \left(\alpha - \delta + \frac{1}{2}\sigma^2\right)t}{\sigma\sqrt{t}} \\ \\ \displaystyle d_2 &=& -\frac{\ln{\frac{K}{S_0}}- \left(\alpha - \delta - \frac{1}{2}\sigma^2\right)t}{\sigma\sqrt{t}} \end{array}$

Then

$\begin{array}{rll} \displaystyle E\left[S_t|S_tK\right] &=& S_0e^{(\alpha - \delta)t}\frac{\mathcal{N}(d_1)}{\mathcal{N}(d_2)} \end{array}$

This gives the expected stock price at time $t$ given that it is less than $K$ or greater than $K$ respectively.

Black-Scholes formula: A call option $C_t$ on stock $S_t$ has value $\max\left(0,S_t - K\right)$ at time $t$.  The option pays out if $S_t > K$.  So the value of this option at time 0 is the probability that it pays out at time $t$, discounted by the risk free interest rate $r$, and multiplied by the expected value of $S_t - K$ given that $S_t > K$.  In other words,

$\begin{array}{rll} \displaystyle C_0 &=& e^{-rt}Pr\left(S_t>K\right)E\left[S_t-K|S_t>K\right] \\ \\ &=& e^{-rt}\mathcal{N}(d_2)\left(E\left[S_t|S_t>K\right] - E\left[K|S_t>K\right]\right) \\ \\ &=& e^{-rt}\mathcal{N}(d_2)\left(S_0e^{(\alpha - \delta)t}\frac{\mathcal{N}(d_1)}{\mathcal{N}(d_2)} - K\right) \end{array}$

Black-Scholes makes the additional assumption that all investors are risk neutral.  This means assets do not pay a risk premium for being more risky.  Long story short, $\alpha - r = 0$ so $\alpha = r$.  So in the Black-Scholes formula:

$\begin{array}{rll} \displaystyle d_1 &=& -\frac{\ln{\frac{K}{S_0}} - \left(r - \delta + \frac{1}{2}\sigma^2\right)t}{\sigma\sqrt{t}} \\ \\ \displaystyle d_2 &=& -\frac{\ln{\frac{K}{S_0}}- \left(r- \delta - \frac{1}{2}\sigma^2\right)t}{\sigma\sqrt{t}} \end{array}$

Continuing our derivation of $C_0$ but replacing $\alpha$ with $r$,

$\begin{array}{rll} \displaystyle C_0 &=& e^{-rt}\mathcal{N}(d_2)\left(S_0e^{(r - \delta)t}\frac{\mathcal{N}(d_1)}{\mathcal{N}(d_2)} - K\right) \\ \\ &=& S_0e^{-\delta t}\mathcal{N}(d_1) - Ke^{-rt}\mathcal{N}(d_2)\end{array}$

For a put option $P_0$ with payout $K-S_t$ for $K>S_t$ and 0 otherwise,

$P_0 = Ke^{-rt}\mathcal{N}(-d_2) - S_0e^{-\delta t}\mathcal{N}(-d_1)$

These are the famous Black-Scholes formulas for option pricing.  When derived on the back of a cocktail napkin, they are indispensable for impressing the ladies at your local bar.  :p