Combinatorics Continued and Maximum Likelihood Estimation

Conditional Probability

An Event is the outcome of a random experiment e.g. watching heads on tossing of a coin, or getting 3 after a dice roll are events. A Sample space is a collection of every single possible outcome in a trial. As the number of trials increases, our sample space contains all the achievable outcomes.

Conditional Probability emerges in the examination of experiments where a result of a trial may influence the results of the upcoming trials.

Here’s how to derive the conditional probability equation shown above from the multiplication rule:

Step 1: Write out the multiplication rule:

  • P(A and B) = P(B)*P(A|B)

Step 2: Divide both sides of the equation by P(B):

  • P(A and B) / P(B) = P(B)*P(A|B) / P(B)

Step 3: Cancel P(B) on the right side of the equation:

  • P(A and B) / P(B) = P(A|B)

Step 4: Rewrite the equation:

  • P(A|B) = P(A and B) / P(B)

Theorem 1 - Product Rule

The Product rule is useful when the conditional probability is easy to compute, but the probability of intersections of events are not.

The intersection of events AA and BB can be given by

P(AB)=P(B)P(AB)=P(A)P(BA)P(A∩B)=P(B)P(A|B)=P(A)P(B|A)

Note that if AA and BB are independent, then conditioning on BB means nothing (and vice-versa) so P(AB)=P(A)P(AB)=P(A)P(A|B)=P(A)P(A|B)=P(A), and P(AB)=P(A)P(B)P(A∩B)=P(A)P(B) as we know already.

Theorem 2 - Chain Rule

The chain rule (also called the general product rule) permits the calculation of any member of the joint distribution of a set of random variables using only conditional probabilities.

We can rearrange the formula for conditional probability to get the product rule:

P(A,B)=p(AB)p(B)P(A,B)=p(A|B)p(B)

We can extend this for three variables:

P(A,B,C)=P(AB,C)P(B,C)=P(AB,C)P(BC)P(C)P(A,B,C)=P(A|B,C)P(B,C)=P(A|B,C)P(B|C)P(C)

and in general to n variables:

P(A1,A2,...,An)=P(A1A2,...,An)P(A2A3,...,An)P(An1An)P(An)P(A1,A2,...,An)=P(A1|A2,...,An)P(A2|A3,...,An)P(An−1|An)P(An)

In general we refer to this as the chain rule.

If on the other hand,

if C1,C2,...,CmC1,C2,...,CmC1,C2,...,CmC1,C2,...,Cm are disjoint events such that C1C2Cm=ΩC1C2Cm=ΩC1∪C2∪···∪Cm=ΩC1∪C2∪···∪Cm=Ω the probability of an arbitrary event can be expressed as:

P(A)=P(AC1)P(C1)+P(AC2)P(C2)++P(ACm)P(Cm)P(A)=P(A|C1)P(C1)+P(A|C2)P(C2)+···+P(A|Cm)P(Cm)

Theorem 3 - Bayes rule

The Bayes theorem, which is the outcome of this section.

\begin{align} P(A|B) = \frac{P(B|A)P(A)}{P(B)} \text{- this follows from Theorem 1} \end{align}

P(AB)=P(BA)P(A)P(BA)P(A)+P(BnotA)P(notA)P(A|B) = \frac{P(B|A)P(A)}{P(B|A)P(A) + P(B|not A)P(not A)}

Here P(A|B) stands for the Probability that A happened given B has occurred. These are both the same, in the second case, you would use this form if you don't directly have the Probability of B occurring on its own. (not A is same as A complement).

Prior probability represents what is originally believed before new evidence is introduced, and posterior probability takes this new information into account. For our example above, this reflects the probability of having a liver disease, given that the patient is an alcoholic.

Maximum Likelihood Estimation

MLE primarily deals with determining the parameters that maximize the probability of the data.

Last updated