Ramblings of a Bayesian (Part I)

“There are three kinds of lies: lies, damned lies and statistics.”

In this first article, I will try to explain the concepts of what we now call probability. You may have heard probabilities used in different contexts; here are three common

examples:

1. There is a 30% chance of rain in Stony Brook, NY, tomorrow,

2. There is a 1 in 292 million chance to win the Powerball Jackpot,

3. There is a 1 in 4 lifetime chance for males to die from cancer.

The question that I would like to pose is – how are these probabilities different from one another? Let’s look at example 2. Here we are dealing with a game of chance, and if we assume that the numbers are drawn fairly (i.e.no bias) then this probability can be calculated as the number of times you can win divided by the numbers of total possibilities. In Powerball, 5 white balls are drawn out of a drum with 69 balls and one red one is drawn out of a drum of 26 balls. So the probability is

$$
\frac{5\times4\times3\times2\times1}{69\times68\times67\times66\times65\times26}=\frac{1}{292201338}
$$

Here we calculated the number of possible outcomes as 69 possibilites for drawing the first white ball, 68 possibilities for the second white ball(since there is one ball missing), and so on. There are 26 possibilities for the red ball. Similarly, there are 5 ways to pick the first white ball, 4 to pick the second, and so on. All in all, this is a very straightfoward way of thinking about probabilities. This leads us to an interesting question. Let’s assume that the lottery does not give you the rules of how they come up with the numbers; all you can do is play the game every week for one year. Given your experiences over that year, could you give an estimate of the overall likelihood of winning? How long would you have to play in order to find the winning probability? If you’re thinking forever, your instinct is correct – it is probably forever. But this is exactly the position that we scientist are in. We don’t know the rules of the universe, but we are trying to figure out what they are given evidence that we collect by doing experiments (or playing the lottery). That is why we call the rules that we find “theories”, since we can never know exactly how “true” they are. At the same time, we know certain laws of physics pretty accurately, and that is why we rely on them every day. For example, your cellphone contains many technologies that make use of pretty advanced concepts of physics: GPS, radio frequency communication, integrated circuits, and LED display. Can you imagine if any of those “theories” suddenly turn out to be wrong and then our cellphones stop working?

Let’s look at probability example one. This probability is much harder to understand, because global weather is constantly changing. All the variables that describe weather – barometric pressure, humidity, wind speed, and so on – are all continuous, and thus harder to count. Also, the weather is dependent on conditions that have happened in the past and that may have happened far away. The probability in example one sounds more like a degree of belief that it is going to rain given the evidence in the past rather than a rigorous counting of all possibilities.

Example three, on the other hand, is in between the two other examples. People can be counted, and if we know exactly how each person died we can calculate the probability to die from cancer over their lifetime. At the same time, if you are male, you may want to know what “your” chance of developing and dying from cancer is. This personal probability may be very different from 1 in 4,but could vary depending on your genetic makeup, your habits and lifestyle, the environment, and many other factors. Here we are once again drifting into probability as a measure of belief given evidence that relate to you only.

This struggle between understanding probabilities as degree of belief or probabilities as exact measures of frequencies has existed since mathematicians have started thinking about problems associated with uncertain or random events. Interestingly enough, the first mathematicians that thought about these problems: Bernoulli (1700-1782), Bayes (1701-1761) and Laplace (1749-1827), thought that probabilities are measures of belief based on the available evidence. Only later the frequentist interpretation of probability was developed and promoted by Pearson (1857-1936), Fisher (1890-1962) and Neyman (1894-1981).