Visualize Small Sample Size

Jennifer Buchanan-USA TODAY Sports

After a terrible offensive few games and a disappointing start to the season, one of the things that we can cling to is that old small sample size cliche. Kyle Seager is going to hit and so is Adam Lind (ok, not positive about him). Six games don't mean anything in terms of player stats, and I've been trying to think of different ways to illustrate that. Here's one way to think about it.

The chart below shows the final basic wOBA [see note] for every qualified batter in 2015 ranked from best (Bryce Harper at the top) to worst (Chris Owings .255).


The first thing you note is that Bryce Harper crushed baseballs like Gregor Clegane crushes noggins (Yes, I am looking forward to Game of Thrones, why do you ask?). The second thing you notice is that it's an organized looking chart. It's organized because I sorted the data and because player abilities come from a normal-ish (or poisson-ish) distribution. There are lots of people in a small band in the middle and a few at the ends.

Here is the chart where the batters are in the same order top to bottom, but their wOBA is from 1 week into the 2015 season instead of their final wOBA. The scales are different because one week's worth of data is so noisy.


Adrian Gonzalez is dominant in this one and you really can't see a hint of the final distribution. One week means nothing.

If you want to watch the data organize itself I made this aminated gif.


Remember, always use the hard G sound when pronouncing gif, or you'll be subjected to "dad humor" like that. Public service announcement over, here's the gif showing the data converging on its final state, week by week.


Maybe not useful, but I find it mesmerizing for some reason.

Here's another take on the small sample size story. This also looks like a fairly random set of data points.


What is it? Kyle Seager's 2015 wOBA computed in six-game increments (from April on the bottom to September at the top). Repeat after me: you just can't tell much from six games.

[note] I used basic wOBA cause it's easier to compute and 96% as good as regular wOBA. (That's what an R-squared of .96 means, right?)