# Monthly Archives: June 2008

## Conditional Variance

If $X$ is a random variable that depends on another random variable $I$, then

$Var(X) = E_I[Var_X(X|I)] + Var_I(E_X[X|I])$

This is called the double expectation formula.  It is important to keep track of which random variable in a problem is $X$ and which one is $I$.  Wieshaus calls $I$ the indicator variable.  In the above equation, $Var_X(X|I)$ and $E_X[X|I]$ are functions of $I$

Example 1:  Noemi and Harry work at Starbucks.  Noemi’s tip jar contains 30% dollars, 30% quarters, 20% dimes, 10% nickels and 10% pennies.  Harry’s tip jar contains 5% dollars, 10% quarters, 10% dimes, 35% nickels and 40% pennies.  A customer steals a coin from Harry’s jar with 99% probability and from Noemi’s jar with 1% probability.  What is the variance of the stolen amount?

1. Identify the random variables.
• The stolen amount is what we’re interested in so this is $X$.
• The distribution of $X$ depends on which jar the coin came from so the choice of jar is the indicator variable $I$.
2. Find the distribution of $E_X$
• $E_X[X|I=H] = 0.1065$ with 99% probability.
• $E_X[X|I=N] = 0.4010$ with 1% probability.
3. $Var_I(E_X[X|I]) = 0.000858629$
4. Find the distribution of $Var_X(X|I)$
• $Var_X(X|I=H) = 0.04682275$ with 99% probability.
• $Var_X(X|I=N) = 0.16020900$ with 1% probability.
5. $E_I[Var_X(X|I)] = 0.04795661$
6. $Var(X) = 0.000858629 + 0.04795661 = 0.0488152$

Filed under Probability

## The Bernoulli Shortcut

If $X$ has a Standard Bernoulli Distribution, then it can only have values 0 or 1 with probabilities $q$ and $1-q$.  Any random variables that can only have 2 values is a scaled and translated version of the standard bernoulli distribution.

Expected Value and Variance:

For a standard bernoulli distribution, $E[X] = q$ and $Var(X) = q(1-q)$.  If $Y$ is a random variable that can only have values $a$ and $b$ with probabilities $q$ and $(1-q)$ respectively, then

$\begin{array}{rl} Y &= (a-b)X +b \\ E[Y] &= (a-b)E[X] +b \\ Var(Y) &= (a-b)^2Var(X) \\ &= (a-b)^2q(1-q) \end{array}$

Filed under Probability

## Normal Approximation

If a random variable $Y$ is normal, you can map it to a standard normal distribution $X$ (useful for finding probabilities in the standard normal table) by the following relationship:

$Y = \mu_y + \sigma_yX$

Example 1:  $Y$ is normal.  $E[Y] = 100$ and $Var(Y) = 49$  Then

$\begin{array}{rl} P(Y \leq 111.515) &= P(X \leq \frac{111.515 - 100}{\sqrt{49}}) \\ &= P(X \leq 1.645) \\ &= 0.95 \end{array}$

Example 2:  $Y$ has the same distribution as example 1.  Then $P(Y \leq y) = 0.9$ implies

$P(X \leq \frac{y - 100}{\sqrt{49}}) = 0.9$

Which implies:

$\frac{y - 100}{\sqrt{49}} = 0.8159$

Hence $y = 105.7113$.

With regard to Central Limit Theorem:

By the Central Limit Theorem, the distribution of a sum of iid random variables converges to a normal distribution as the number of iid random variables increases.  This means that if the number of iid random variables is sufficiently large, we can get approximate probabilities by using a normal distribution approximation.

## Mixture Distributions

Finite:  A finite mixture distribution is described by the following cumulative distribution function:

$F(x) = \displaystyle \sum_{i=1}^n w_iF(x_i)$

Where $X$ is the mixture random variable, $X_i$ are the component random variables that make up the mixture, and $w_i$ is the weighting for each component.  The weights add to 1.

If $X$ is a mixture of 50% $X_1$ and 50% $X_2$, $F(x) = 0.5F(x_1) + 0.5F(x_2)$.  This is not the same as $X = 0.5X_1 +0.5X_2$.  The latter expression is a sum of random variables NOT a mixture!

Moments and Variance:

$\begin{array}{rl} E(X^t) &= \displaystyle \sum_{i=1}^n w_iE(X_i^t) \\ Var(X) &= E(X^2) - E(X)^2 \\ &= \displaystyle \sum_{i=1}^n w_iE(X^2) - \left(\sum_{i=1}^n w_iE(X)\right)^2 \end{array}$

Filed under Probability

## Variance and Expected Value Algebra

Linearity of Expected Value: Suppose $X$ and $Y$ are random variables and $a$ and $b$ are scalars.  The following relationships hold:

$E[aX+b] = aE[X]+b$

$E[aX+bY] = aE[X] +bE[Y]$

Variance:

$Var(aX+bY) = a^2Var(X)+2abCov(X,Y)+b^2Var(Y)$

Suppose $X_i$ for $i=\left\{1\ldots n\right\}$ are $n$ independent identically distributed (iid) random variables.  Then $Cov(X_i,X_j) = 0$ for $i\ne j$ and

$\displaystyle Var\left({\sum_{i=1}^n X_i}\right) = \sum_{i=1}^n Var(X_i)$

Example:

$X$ is the stock price of AAPL at market close.  $Y$ is the sum of closing AAPL stock prices for 5 days.  Then

$\begin{array}{rl} Var(Y) &= \displaystyle \sum_{i=1}^5 Var(X_i) \\ &= 5Var(X) \end{array}$.

Contrast this with the variance of $Z = 5X$.  In other words, $Z$ is a random variable that takes a value of 5 times the price of AAPL at the close of any given day.  Then

$\begin{array}{rl} Var(Z) &= Var(5X) \\ &=5^2Var(x) \end{array}$

The distinction between $Y$ and $Z$ is subtle but very important.

Variance of a Sample Mean:

In situations where the sample mean $\bar{X}$ is a random variable over $n$ iid observations (i.e. the average price of AAPL over 5 days), the following formula applies:

$\begin{array}{rl} Var(\bar{X}) &= \displaystyle Var\left(\frac{1}{n} \displaystyle \sum_{i=1}^n X_i\right) \\ &= \displaystyle \frac{nVar(X)}{n^2} \\ &= \displaystyle \frac{Var(X)}{n} \end{array}$