clock menu more-arrow no yes mobile

Filed under:

Ichiro & More Fun With Probability

Long-time readers of this site know that I'm a numbers guy. I like data spreadsheets, and I like using those to make other data spreadsheets, continuing along until I arrive at some groundbreaking new idea like "take a shower." My favorite new tool is how binomial probability can apply to baseball (hitters in particular). It's not a new idea, but I think it's an underused one, and one that can be used to help explain certain points that may otherwise be a little confusing.

Today's point? Sample sizes. Some people are comfortable with the notion that smaller samples are prone to wider variation, but others have a little trouble with it. Allow me, then, to post a chart of some Ichiro probabilities, based on his lifetime batting average of .332:

(680 at bats represents a full season, 340 is a half-season, etc. The lines have been smoothed out for aesthetic purposes, so they're not 100% exact. That's what the data on my spreadsheet is for.)

The point here is that the larger the sample, the smaller the range of probable batting averages. Given 85 at bats, Ichiro stands a reasonable chance (better than 1%) of hitting anywhere between .225 and .450. Bump that sample up to a full season, though, and the range drops to .300 - .375. What you're seeing is that, as the size of a sample increases, the data will closer approximate its "true" value (in this case, Ichiro's lifetime BA).

Another way of looking at it is to see the probability of Ichiro hitting below a certain mark, as shown here:

You see the same kind of thing here - the range of possible outcomes gets smaller as the sample increases. Posting that chart wasn't really necessary, I guess, but I like the colors.

The fun with binomial probability doesn't end there, either. Now let's pretend that Ichiro's "true" batting average is .303 (his 2005 mark), and that he gets 680 at bats next year.

Odds of hitting .400: 0.000005%
Odds of hitting .300: 58%
Odds of hitting .275: 95%

Based on what he did last year, we can be pretty sure that Ichiro will hit at least .275 in 2006 - if he drops below that, it will be because something's changed (in other words, it will be out of the realm of general statistical chance, as determined by a 95% confidence level).

So, that's your introduction to binomial probability. Go ahead and screw around with different numbers on that website I linked - done properly, it can be useful stuff. Plug in Miguel Olivo's numbers in Seattle and see what the chances were that he'd flip out in San Diego. You'll puke!