clock menu more-arrow no yes

Filed under:

Identifying Candidates for Regression

New, 41 comments

There are two main areas you should look at when judging whether a pitcher is due for regression (either good or bad). First, you look for extreme values in areas that we know are under little or no control of the pitcher. BABIP (average value for SP = .296, RP = .293) is the premier example. If a starting pitcher has a BABIP less than .275 or greater than .315, then you'd do well to bet on overall regression next season. There are other such statistics, pointed out here as the components with low correlation, but the other major one listed is HR/FB% (average value for SP = 13.9%, RP = 12.6%). Because of its direct impact on runs allowed, extreme swings in HR/FB can radically change a pitcher's performance by traditional metrics. The percentage of runners left on base (average value for SP = 70.4%, RP = 72.8%) is the last such key statistic but we have to be more careful with that one because it actually does have some year-to-year correlation (r=0.18), tied namely to a pitcher's strikeout rate.

The second way is much more subtle and admittedly not possible for most people, but it can bring important evidence to the table regarding who might regress. This technique is somewhat the opposite of the above. Instead of looking at the variables that a pitcher has little control over and regressing those to the league mean, we look at the variables that a pitcher has a high degree of control over and inspect the related statistics. This is best illustrated with an example. There's a solidly direct relationship between the number of missed bats a pitcher generates and the number of strikeouts that he gets (r=0.71) and swinging strikes is more consistent year over year than strikeout ratio (.77 to .75). Granted, it's not much, but what this gives us is a relationship to look at. If a pitcher has many more or less strikeouts than his percentage of missed bats would suggest, that provides another hint toward regression. The same holds true with percentage of pitches that are balls and walk ratio.

To ferret out these regression candidates, I looked at each of the five categories mentioned above: Ks, BBs, HR/FB%, BABIP and LOB%. If a pitcher was above or below the expected amount by greater than half a standard deviation, I made a note of that figure. Most pitchers end up with a grab bag of assorted categories. For example, Erik Bedard appears four times for 2007, for everything but his BABIP. But while Bedard's regressed values suggest he should do worse in the strikeout and LOB% categories, they also suggest he should do better in the walk and HR/FB% categories, pretty much a wash. The pitchers we are looking for are those that show an overwhelming majority of under/overperforming and who appear on more than a few of the categories.


Jered Weaver, 2006.
-Weaver is perhaps the prototypical example. In 2006, Weaver struck out 23% more batters than you'd expect given his percentage of missed bats, he left 22% more men on base than normal, allowed a hit on 20% fewer balls in play than the league and had 21% fewer flyballs turn into home runs than you'd expect. Jered Weaver appeared on four the measurements and each one portended Weaver's 2007 to be worse than his 2006. So what happened? In 2007, Weaver saw his missed bat percentage drop to 7.83%, below league average (7.95%) and his K rate fell to minimally above league average (15.57%) to 15.97%. His BABIP went from .237 to .313. He went from stranding 86.2% of baserunners to 73.6%. His HR/FB% actually dropped, the only statistic he appears on for 2007, but again, one that suggests further regression. In 2006, Jered Weaver posted a 2.56 ERA coupled with a 3.99 FIP. In 2007, Weaver's ERA skyrocketed to 3.91 while his FIP remained relatively stable at 4.14.

Kris Benson, 2003.
-Benson had an improved strikeout, LOB and BABIP rate to look forward to according to regression. Benson's LOB% actually fell a bit, but the expected rebounds in strikeout and hit rate did occur and more than offset the extra bad luck on runners scoring in time to land Benson an absurd at the time contract in late 2004.

Chris Carpenter, 2004.
-Just going to show you that these measurements do not preclude a pitcher from improving, Carpenter exceeded expected rates in walks, LOB% and BABIP while underperforming in HR/FB% in 2004. Sounds like a recipe for a step backward in 2005 right? Well, 2005 saw Carpenter drop his FIP from 3.71 to 2.86 and his ERA from 3.46 to 2.83 as he pitched 242 innings and grabbed himself a nifty Cy Young trophy. But guess what? His walk rate rose, his LOB% fell, his BABIP rose and his HR/FB% fell, 4 for 4.

Shawn Chacon, 2005.
-Hit the trifecta in the triple crown as I refer to the weakly correlated stats. Higher than expected LOB%, lower than expected BABIP and HR/FB%. All three regressed in 2006 and he was predictably terrible.

Shawn Estes, 2003 and Casey Fossum, 2004.
-The reverse triple crown, underperforming in the three stats mentioned with Chacon. All three regressed positively for Estes in 2004 and he shaved nearly a run per 9. Ditto Fossum except he lost nearly two full runs allowed.

Tom Glavine, 2004.
-Expected more walk, home runs, hits and less stranded runners in 2005. Swing and miss on three of four. The hits came back in force, but the other three did not change and in fact have been very stable for the past few years. Glavine is a good example of a pitcher that breaks the mold.

Oliver Perez, 2003, 2004 and 2006.
-Perez has more entries than any other pitcher during the time period. Regression analysis predicted Oliver's improvement from 2003 to 2004 (though not by that much, see next), his downfall from 2004 to 2005 (too much good luck in '04) and again the improvement in 2007 over 2006 (and is expecting him to take a step back in 2008, but not a dramatic one).

Jaret Wright, 2004.
-Yeah, who didn't see that one coming? I mean besides the Yankees.


Josh Beckett - Beckett was a full standard deviation above the expected K and LOB rates. It's not to say Beckett isn't great, but, and this should come as no surprise, expect a regression away from his 2007 Cy Young effort.

Joe Blanton - For the second year in a row, Blanton's walk and HR/FB rates are below what you would expect. Blanton may just end up being a pitcher who always walks less than normal and the home run rates can be partially explained by Oakland, but it's worth keeping an eye on.

Lenny DiNardo - Can expect improvement in his K/BB ratio and should strand a few more runners.

Zack Greinke - Look for a worsening K/BB ratio and more runners scoring, but fewer home runs allowed should his groundball ratio stay constant.

Jeremy Guthrie - Too low of a BABIP and too high of a LOB% coupled with a higher than expected K rate spells uh-oh for oh-eight.

Ted Lilly - Same as Guthrie above though he should see some improvement in the walk department that could offset a change in strikeouts.

Scott Olsen - Would expect fewer walks and an improvement in each flukey stat; less hits and home runs and more stranded baserunners. Too bad nobody would notice since he plays for Florida.

Jake Peavy - See Beckett. Anytime you have a season like Peavy there's a good chance that you had a dose of good luck to go along with immense talent. It's the nature of how good the MLB talent pool is as a whole. It's remarkably unusual for any one player to be heads and shoulders above everyone else.

Justin Verlander - Could be in line for less Ks and stranded runners and a slight uptick in hits and home runs per flyball, but this could fly out the window as he continues to take steps forward in his talent level each year in the rotation.