With Maximum Likelihood Estimator (MLE) problems, a parametric distribution is named, one or more of it’s parameter values are unknown, a set of data from this distribution is given, and you’re asked to find the parameter values that maximize the probability of observing the data. You do this by brute force. For each data point, you express the probability of observing it, which is simply the density function of whatever named distribution was given in the problem. The probability of observing the whole set of data is simply the product of the probability of each data point. If there are lots of data points, you end up with a massive function.

For example, the density function for is given by:

in which is the unknown parameter. You are also given a set of observations and you must find the value of which maximizes the probability of seeing these particular values. The likelihood of seeing these values is given by :

To find which maximizes this function, you take the derivative of it, set it equal to 0, and solve for . If you’re looking a few steps ahead, you should realize that doing the maximization by brute force will be difficult. To get around this, we can log the likelihood function then find the maximum. This does not change the resulting value of . The log likelihood is then:

The derivative with respect to is

Equating to 0 and solving, we have