We're into the stretch run now. I'm not going to go into individual statistics - the idea was never to walk through absolutely everything but rather to provide a solid foundation that facilitates good, logical thinking about sabermetrics. So instead of talking about strikeouts, wins, tRA, xFIP, whatever over the next few days, I'll describe how I think pitching/batting/defence should be evaluated - but in general. We'll start with pitching.
Prerequisites for Understanding: The Isolation Problem, Linear Weights, Base Runs, Replacement Level, Expected Wins/Losses, The Run-Win Conversion, Value, Regression, Correlation, Park Effects, Environment, WPA and LI, Data.
What makes a good pitcher? What makes a bad one? How do we evaluate them? Pitching is a deceptive area of study - our first generation numbers told us that we knew how many games pitchers were responsible for winning, and how many runs they gave up. For a very long time, we were content with this.
And then, quite suddenly, we weren't. What are wins, we ask? And what exactly does ERA tell you? Well, wins tell you how often your position players score more runs over the course of a game than the pitcher and the position players save. ERA is similar bizarre, thinking about it: How many runs does a pitcher and his defence concede per game discounting runs that the scorers think ought not to have counted.
This would, of course, be all well and good if the impact of defence was negligible, or that pitchers had any real control over whether batted balls find gloves or not. But defence matters. It can make average pitchers look like world beaters, and replacement level pitchers look alarmingly valuable. Whenever the ball enters the field of play, the defence is involved, and understanding that and seeking to adjust for it is absolutely critical.
So, what do we need in order to measure pitchers? Ideally, we'd be able to judge them without the defence clouding the issue. We certainly shouldn't involve bats, so wins and losses are outs. We must build upon the things that the pitcher is solely responsible for: strikeouts (mostly), walks, HBP, and home runs. After these are taken into account, we can start looking at batted balls - third generation data gives us some insight into how difficult a ball is to field, although it's not as accurate as we'd like. Anyway, my belief is that if a pitcher gives up a drive that's an out 90% of the time with an average defence, we should give him 0.9 of an out. Credit for the quality of the defence should probably go to the actual defenders rather than the pitcher. Actually doing this is non-trivial, but it's the direction we should be steering our statistics.
We then need to turn this information into runs and outs. Outs are essentially trivial with good enough defensive data, but the run conversion can come from linear weights, Base Runs (this is, of course, more accurate than linear weights), or any other run expectancy tool you can think of. With expected runs and outs, you can figure out how many runs you'd expect a pitcher given up per nine innings... to a point: We've neglected 'situational pitching'. Personally, I think this is an acceptable oversight, but it may well be that it can make a significant difference to our evaluation of pitchers. Certainly, it's something that will be fairly important to look into down the line.
So. Expected runs per nine. Using this combined with some function of batters faced leads you to a certain number of runs above average which eventually leads us to wins (and don't forget to park/league adjust!). But many prefer a quite elegant shortcut: if we know the expected runs allowed per nine and the league average figure, we can use pythagorean theory to derive expected winning percentage. This is the number that WAR for pitchers is based on, and ultimately what we want to know. Getting to that point is just a matter of refining the method and using better data: our general theory is laid out pretty cleanly.
We should be careful to regress our numbers pretty severely when dealing with pitchers, as some of their outcomes are highly luck-dependent (notably home runs per fly ball, and to a lesser extent some ball in play classifications). As with everything we look at, remember that the data at hand never tells the whole story. Regression is the name of the game, and we want to apply it mercilessly when non-correlative statistics come into play. But we should also remember that there is real value in measuring what a pitcher actually has done, as well.
Things to Remember
- The transition between the rotation and the bullpen sees bullpen pitchers give up more walks, but less home runs and strikeouts. This results in replacement level being higher in the bullpen
- The National League doesn't see the designated hitter; the AL does. Keep this in mind when making league adjustments.
- Therefore, National League pitchers have some value tied up in hitting. Including a pitcher's batting ability can make a big difference in their valuation.
- An interesting technique to measuring bullpen effectiveness is to consider the average leverage of each 'role' in the pen. This has a multiplicative effect on the value of each player - a closer might see his LI at 2.00, for example, meaning that every run saved above average is really counting as double. We can use LI here because it is what is defining bullpen roles in the first place (admittedly, not that well, thanks to the saves rule).
- Remember to include 'scouting-style' information when you think about player value. Stuff, command... these are extremely important things to consider. Don't ever use one number as a crutch - that way lies dogma.