Equivalence (Likelihoods)

fieldnotes
3 min readJun 13, 2021

There are many equivalence relations you might come across in your own mathematical journeys, but there is one particularly useful one acknowledged in Bayesian Statistics.

Before we take a look at it, let’s do a little thought exercise:

Suppose we have a probability of flipping heads for a coin , X.

Then, we can easily find the probability of getting 2 heads out of 2 flips, knowing that the probability of flipping heads is X.

A harder question to answer, however, is what is the likelihood that the probability of flipping heads is X, given only a dataset of 2 flips?

Luckily, we can answer this question using our understanding of probability.

So we want to convince ourselves,

or more generally,

“The likelihood of a parameter value given data EQUALS the probability of data, given a specific parameter value”

If we refer back to the table from a previous post, we can make better convince ourselves of this helpful notion:

If we look at the column heading with “X=2”, we will be focusing our attention to the event of getting 2 heads out of two total flips. This is what we want.

So, if we calculate the probability of getting 2 heads for any of the given probabilities of heads (theta), we will be essentially calculating the right side of the equivalence relation:

We are getting closer to the equivalence.

Let’s take the statement Pr(X=2|theta) (which is the same as the right side of the equivalence relation but more specific to our case) and keep simplifying it:

Probability of getting two heads, given any theta equals theta squared.

At the end you see an expression only in terms of theta (circled in green). This is what brings us the likelihood, or the left side of the equivalence relation:

The tenet that ultimately binds the left and right sides of the equivalence relation is also a central theme to Bayesian statistics. The likelihood is dependent on ALL values of theta (or any parameter of interest). Theta is NOT fixed.

Still not convinced? Think of it this way, each theta comes from some fixed data.

Each theta is a “different conclusion” that is drawn from that same fixed data set. So, by allowing us to vary the thetas we are using, we are constructing a true likelihood function that fulfills the purpose of finding the most optimized theta that predicts the data.

Main takeaway: The equivalence relation between likelihoods and probability depends on the fact that the parameter value in question is VARYING, and the resulting distribution is NOT a proper probability distribution.

Research done from: “Likelihoods” A Student’s Guide to Bayesian Statistics, by Ben Lambert, SAGE, 2018.

--

--