Pitching Components Year-To-Year

We strive to find measurements in pitching that give us high correlations year to year because that helps tell us the measure of pitching that are more stable and stability is what you need in order to do projections. If a pitcher's win total had a year-to-year correlation of exactly zero (it's close), then the best case that we could do for projecting 2008 win totals would be to assign the same, league average, number of wins to each pitcher. That's not very helpful. In fact, it's not helpful at all. Conversely, if we knew that a pitcher's groundball ratio had a correlation of exactly one (it's close) then we could predict 2008's groundball ratios accurately by simply using 2007's totals. That's incredibly useful.

These disparate examples show the full range of possibilities when it comes to projecting future performance. At one extreme, we can use 100% of the pitcher's own totals (i.e. 2007) and ignore everyone else. At the other, we are forced to use 100% of the average and completely disregard what the individual pitcher accomplished.

It's important to know which category each stat falls into and not only to just focus on the ones with high correlation, but also to remove the taint of the stats with low correlation. This is what we call regression. If our numbers shows that a pitcher's strikeout rate has a year-to-year correlation of 0.75 (actual number), that means that for projecting Pitcher X, we would take his 2007 strikeout rate * 0.75 and add in (1 - 0.75) * league average 2007 strikeout rate to arrive at our projection for his 2008 strikeout rate.

Naturally there are far more complex things that go into most projections such as aging curves, park factors and the like, but at their base all projection systems anchor themselves in these regressions in some form or another. So, wouldn't you like to know what those regressions are? Of course you do. To find them out, I went through my pitch data I pulled out all the pitches from starting pitchers who faced at least 200 batters and threw at least 1,000 pitches in a season of which there were 827 such seasons. I then grouped them by player by consecutive year so that the data would only be counted if a pitcher appeared in both year x and year x+1. Here's the results:

There's some pretty important factoids up there. At the very bottom we see why ERA is a terrible measurement. That low correlation goes to show how little control a pitcher has over it. Ditto BABIP (no surprise) and ditto HR/FB which might come as a bit more of a surprise especially if you noticed how incredibly stable GB% is, but as the plot below shows, there appears to be little to zero correlation between a pitcher's GB rate and his HR/FB ratio.

Since HR/FB isn't stable, it's no surprise that HR isn't either. Walks and strikeouts are fairly predictable, but not as good as balls and missed bats which is what leads me to say that if we wanted to construct a finer version of the three true outcomes for pitchers, it would be balls, swinging strikes and groundballs. It dovetails both with the data above and what's logical to me; that the three things we want most in a pitcher is to throw strikes, miss bats and keep the ball on the ground. Most everything else is out of his control.