By opening day, the baseball world will have tired of meaningless spring training results, minor injury reports and conjecture on which March phenoms should or should not make a roster. What will remain is the inevitable foolhardy stream of individual predictions on how the standings will shake out by the following October. Some in the pundit class will rely entirely on statistical analysis, some on 'gut' and 'what I can see with my own eyes', but in the end, it's telling that no individual has ever gained a reputation for excellence in seeing into the future.
Of course, the stats world has already gone through this process using highly refined proprietary models to generate results. For the purpose of this exercise, the focus is on three from last year, plotted against the real 2013 results: the Pythag standings posted at Fangraphs (yes, I know Dave Cameron hates this); PredictionMachine.com ('we play the game 50,000 times before it's actually played'); and PECOTA, conceived by the wondrous Nate Silver.
For anyone, the process is doomed from the outset as there is no way to foresee injuries, roster moves, managerial decisions, clubhouse chemistry, veteran leadership or what exactly causes a Josh Donaldson to occur. So, spoiler alert where one really isn't needed--standings predictions aren't very reliable.
And of course, small sample size alert, as well--looking at only one year's results may prove absolutely nothing. But having said that, let's move forward.
First, let's divide the predictions from the actual results between three groupings: teams whose eventual win total varied from the average prediction by less than five wins; between five and ten; and eleven or above.
2013 Variance Totals
PredictionMacine.com: (0-4) 13, (5-10) 7, (11+) 10
Pythag: 13 / 8 / 9
PECOTA: 10 / 7 / 13
So, the easy takeaway is that, just like playing baseball, predicting baseball results is really hard.
On to drill-down #1: here are the teams whose results most deviated from the collective predictions of the three entities above:
Three were way too pessimistic (Red Sox, Indians and Pirates), while three saw promise that never materialized (Jays, White Sox and Angels).
On the other hand, here are the six teams whose actual win totals most closely matched the crystal ball:
(And I would add that the only reason the Astros didn't make this list is that, while all three accurately pegged them as clearly the worst team in baseball, only Pythag was bold enough to see them showing as poorly as they actually did).
So, the broad generalization I make is that it's easier to predict results at the periphery--teams with copious talent, or a notable lack thereof. I think this makes sense, since these teams are less likely to be influenced by an injury, a star call-up or a veteran collapse.
On the other hand, the biggest misses came more often from teams predicted for the middle. While both the Jays and Angels fell precipitously from lofty expectations, the Indians, White Sox, Pirates and Red Sox all averaged between 76 and 81 predicted wins among the three models.
Or hey...why not us?