clock menu more-arrow no yes mobile

Filed under:

The danger of small sample sizes

At what point can we put our faith in statistics?


There's no doubt that spring fever causes us to make some rash decisions. We see a few home runs clear the fence of the Peoria Sports Complex, an improved grip or stance, a little extra speed on the basepaths, and we take it as incontrovertible evidence that Big Changes are coming to the club when spring ends and the real season begins.

If you think I'm hyperbolizing, look at the quotes that pervaded spring training columns in 2013:

"Here are a few more things to watch for in what I predict will be a fabulous, unpredictable, exhilarating, heartbreaking, heart-stopping 2013 season..."

"Putting the Mariners third was a tough call for me. No, not because I think the Athletics really are the better team. No, more because I almost put the Mariners second and the Rangers third. There is something about this year's team that gives me a good feeling."

"Is what we've seen power-wise from the Mariners this spring legitimate? The Mariners, after that monster home run yesterday over the batter's eye in dead center [...] now have a franchise record 54 home runs in Cactus League play. And yes, I do think we're about to see a carry-over to the regular season."

"We're going to be good. We're going to surprise people. [...] What I see is that we've got more offense. [...] We're always together right here in the clubhouse. We're having a lot of fun. And that's the difference."

After a long, cold offseason, it's easy to hype a new season, especially one with fresh faces and promising talent. This year is no exception, with Robinson Cano at the helm and last year's rookies ready to step into full-time roles. The mistake we make here is not building up our hopes and expectations -- there's no harm in wishing the Mariners a productive and successful year -- but buying into the myth that small sample sizes are useful in predicting an individual player's trajectory over the course of the regular season.

This question has been dissected by some of the finer minds in the sabermetrics community, but it bears repeating as we inch toward Cactus League competition: When does a player garner enough experience for his statistics to stabilize?

Nearly seven years ago, Baseball Prospectus' Russell Carleton explored the nature of small sample sizes for He attempted to define the number of plate appearances or batters faced that a player needed before his stats became stable enough to analyze, both among groups of his peers and on an individual basis.

In order to do this, he drew the line at .70. Any statistic with a correlation of .70 or greater could be considered fairly stable. Any that did not was discarded. While most stats settled around .70 after a relatively low number of plate appearances/batters faced, several failed to stabilize, among them batting average, home runs per nine innings, and home run to fly ball ratio.

Of course, the more appearances a player accrued, the likelier it was that his stats would increase in reliability. In the regular season, most statistics passed the threshold of stability fairly quickly. For instance, by the time an individual batter reached 150 plate appearances, he could record a stable swing percentage, contact rate, strikeout rate, and line drive rate.

Pitchers, on the other hand, were a bit trickier to diagnose. Carleton found the most stability in statistics that measured walks, strikeouts, and the types of batted balls put in play. An individual pitcher needed to face at least 150 batters before establishing a reliable strikeout to plate appearances ratio, groundball rate, and line drive rate.

While this works fairly well over the course of a season, when a batter might notch 600 plate appearances and a pitcher might face 600 batters, it is nearly impossible to apply to spring training numbers.

Take Dustin Ackley, for example. He logged 50 plate appearances last spring, with 19 hits, two walks, and nine strikeouts. According to Carleton's list of benchmarks for individual batters, the only stat with any semblance of reliability is his swing percentage.

In the regular season, we might be able to build a case for Ackley starting with this sample size. Spring training, however, presents a myriad of problems. For one thing, the Mariners acclimate to a different environment in Arizona. Peoria Sports Complex’s main stadium is roomier than Safeco Field, with more space at the corners and in the left field power alley. The atmosphere is warmer and allows the ball to travel unimpeded by Seattle's winter chill and ocean winds.

For another thing, players do not receive the same amount of playing time in spring training that they do during the regular season. Rosters expand to give everyone the opportunity to improve their skills and prove their worth, and rarely if ever do lineups reflect the ones that will be rolled out on Opening Day. Part-time players may receive as many opportunities as starters, lessening their chances of sustaining that level of production throughout the season.

The same concerns hold true for pitching stats. For example, Aaron Harang saw 95 batters in the spring of 2013, on the high end for spring training pitchers. Of those, he struck out 13, walked nine, and gave up three home runs. Unfortunately for Harang, he needed to face at least 55 more batters before any statistics could stabilize in a meaningful way.

If you're still tempted to use a player's spring performance to buoy your hopes for the coming season, turn instead to his history. Chances are, the patterns and proclivities you find there will be far more trustworthy than anything he leaves behind in Arizona.

For another, slightly more optimistic take, FanGraphs' Mike Podhorzer did a series in 2012 about the relevancy of pitchers' strikeout and walk rates in spring training. You can find it here.

Do you think spring training stats can be useful indicators of regular season performances? If so, why?