Probability vs. Certainty

Jeff's note: this is required reading. Go over it as many times as is necessary to understand that everything - everything - we do deals with probability, and that nothing is black and white.

Picture the following:

Two friends are arguing over the likely outcome of a set of ten coin tosses.

One declares - quite sensibly, since these coins are known to be fair - that he expects said coins to be distributed evenly.

The other, more radical in thought, feels that they'll come up all heads but one.

The coins are flipped. Nine heads, one tails.

'Ah', says the second friend, quite happy, 'You were wrong. That's why we flip the coins!'

Pretty silly, right? Everyone knows that, cheating aside, coins have a 50-50 split between heads and tails. This obviously isn't to say that every time that 10 coins are flipped, the result is even. If we flipped 10 coins 10,000 times, we would instead see a distribution that looked something very similar to this curve (except it would be jagged):

This is known as a probability density function, and understanding what these are is vital when assessing the strength of a prediction. A PDF essentially gives the expected probability for a whole random of outcomes. In this case, it's a bell curve, but they can of course look much more complicated, and even bell curves can look very different because of what's called variance (essentially the spread of the data). Obviously, friend #1 was wise to predict 5/5, even though if you look closely a true 50/50 split will only occur 24.6% (252/1024) of the time. 9/1? 1% (10/1024). Using PDFs, you can say things like 'the number of heads is 62% likely to be 5 or less', which would be completely accurate even if it didn't turn out that way the next time you ran the experiment.

'What does this have to do with anything?', you ask yourselves. 'I hate numbers!' a heckler in the back calls out. 'Please won't someone love me?' a strange young boy cries.

Well, the thing is, every half-competant baseball analyst is in the business of thinking in terms of these PDFs. No, things are never going to be as simple as a coin toss - these are athletes playing a sport, not random numbers dancing around a spreadsheet - but that's not a prohibitive barrier with all the research that goes on these days. PECOTA? It's not giving a number, it's giving a curve. That's where things like 10%, 50%, and 75% levels come from. You've got a team's Pythag predicted accurately? Great, then you can generate a PDF and say that they have something like a 70% chance to be within +/- 4 games of that.

We don't deal with certainties when we look at this game. Sometimes, it comes across that way to people who've never seen stuff like this before and then have PECOTA dropped on them, but it's really not true at all - 'I don't think the Mariners have a high chance of making the playoffs' is NOT the same as 'We're not going to make the playoffs'. And it's not just analysts who do this - everyone who thinks about the future does, albeit subconsciously. What do you think a scout is doing when he's evaluating prospects his team might want to pick up in the draft?

Anyway, here's the crux of the matter.

We work in probabalistic terms. This means that when you tease a single number out of us, it's going to be our best guess and will probably be wrong. This does not mean that the prediction curve itself is complete bollocks (although sometimes this is in fact the case).

'That's why we play the games' is not an acceptable response to an argument about probability any more than 'that's why we flip the coins' is. An argument against a prediction must be conducted against that prediction's assumptions, rather than with a 'Well we'll just wait and see what actually happens then' because that's just not how probability works. Challenge the mechanism behind the prediction, not the expected outcome. I'm not saying the situation is as black and white as my hypothetical argument between friends, but invoking that statement means that 2 doesn't really understand that 1 was NOT stating with 100% certainty that 5 coins would come up heads.

He was, after all, perfectly correct even though he was wrong. The illusion of certainty is a ghost that many people would do well to stop chasing.