We're almost to #10 in our little series here, so I wanted to take stock of what things people are liking vs. not liking? Too short? Too long? Too technical? Let me know!
Prerequisites for Derivation: Game state, database.Non-Linear Weights
In getting an understanding of how linear weights help us convert on-field events to runs, we also came across a rather interesting problem: in relying upon league average values to derive our run weightings, we start to lose accuracy at the fringes. This is because scoring runs is not actually a linear function - we can reduce it to one for a reasonable result, but the nature of relying on linear weights means that we'll never really overcome this deficit. Instead, we maintain accuracy where it's most important, and sacrifice some elsewhere in favour of having an easily-derived, simple system.
After all, non-linear relationships are a total pain to both derive and understand, right?
Not so fast!
Let's start with what we know must be true about run scoring. Home runs mean at least one run is scored. The rest of the runs are scored by runners on base who manage to advance to home plate. Many baserunners do not manage to score, because the inning ends with them left aboard, or the get caught stealing, or gunned down at third base on a single to right. Regardless of what actually happens to baserunners (note that our definition of 'baserunner' excludes those rounding the bases on a home run, simply for ease of writing's sake), this must be true.
With the three truths about run scoring above, we can actually construct an entirely theoretical run estimator. This is Base Runs (BsR), and it's a very simple idea: runs can only be scored on home runs or by driving runners in. Here it is, in equation form:
Now, this appears to be trivial, and fairly unhelpful. We've shifted the problem around from trying to figure out runs as a whole to trying to determine the frequency of baserunners being driven in. Yes, if we stopped here, we wouldn't really have solved anything. Fortunately Base Runs does not stop here. There appears to be an empirical relationship between the fraction of scoring baserunners (we'll call this number S from now on) and baserunner advancement (A), as well as outs (B):
As far as I know, there's no proof that this relationship must be true, but it makes sense seeing as you need to advance baserunners along to home plate in order to score and adding more outs must mean less runs scoring. Best of all, it is (for the most part) immune to environmental effects, so the relationship should hold true all across the possible baseball spectrum. By using Base Runs (of course, you have to calculate average runner advancement per type of play using game state, which is difficult), you avoid many of the problems we encounter with linear weights. It's especially useful for accurately measuring pitchers, who frequently stray into territory that linear weights finds a little daunting. Not many current statistics actually use Base Runs as their run estimator, but it's an important concept to grasp as part of the thought process of sabremetrics (and the ways our current crop of stats might be improved).