Now we're really getting into the good stuff.
Prerequisites for understanding: Game state, environment.
Prerequisites for derivation: Game state; database.
The Not-So-Missing Link
Let's go all the way back to the beginning. We started this series by looking at the game state, along with run expectancy and win expectancy. We did so for a reason, and that's because it's impossible to understand any of the more modern statistics without a good understanding of the game state (and run expectancy). Here, we have the intermediate step between game state and a useful metric, whether we wish to look at batting, pitching, or defence: Linear Weights. Just as runs require some translation in order to be presented in a measure that has some inherent value, the events on the field must also be converted into runs. How? By going back to our game state and looking at run expectancy.
We know the average number of runs scored over the remainder of an inning in any baserunner/out state. Bases loaded, no out? You're looking at a lot of runs. Empty, with two down? Rather less. With play-by-play data, we can actually look at any class of event and find out the average change said event causes in run expectancy. Add in the average number of runs that scored on a play and suddenly you're left with the value, in runs - i.e. the linear weight - of any given event. This is a pretty big deal, as without it we'd have no way to measure the relative importance of say, walks and singles. Combined with the run/win conversion, linear weights (in run form) bridge the gap between the old baseball stats and value.
Nothing Is Arbitrary
Consider the previous paragraph again. It's critically important to have a good grasp of what it means: that all of our top-line stats are related to runs above or below average by empirical means. There is nothing arbitrary in the exact weighting we have of a home run relative to a triple, or a ground ball to a line drive. Years upon years of data allow us to convert back and forth, or up and down with ease. A common complaint with modern sabremetrics is the bewildering array of fractional coefficients that dot the scene, but if you look at a formula that's based on linear weight, don't see them as confusing numbers. Instead, look at them as relative values, derived through years of baseball being played.
A Livable Zone
Linear weights is a fantastic tool, but we should be aware of the limitations as we sing its praises. Because we build our run (and out) values on league average data, there's no guarantee that they work in extreme environments. And, in fact, they do not. At all. If a pitcher struck out 100% of the batters he faced and we attempted to estimate his ERA though linear weights, we would end up with our pitcher allowing something like negative three runs per nine innings, a clearly impossible solution. Situations don't have to be as extreme as that either: the best pitchers in baseball effect their run environment to the point that linear weights may not accurately reflect the true conversion between their pitching and the runs we'd expect. The take away point? Linear weights are optimised for the average baseball game, and start to fall apart when you drift too far away from that. They're still usable when a long way from the mean, but as with anything, understanding what's wrong with what we use is just as important as knowing what's right.
Baseruns, wOBA, FIP, tRA, UZR.