FanPost

The BaseRuns Model

Last time I examined some of the issues with OPS and why, although it's unsurpassed as a quick and dirty measure of offense, there is room for improvement if you want to allow more complexity. GPA is one such method; adding a bit of complexity in exchange for a bit more accuracy, but GPA is but a stepping stone between the beautiful simplicity of OPS and the more accurate but mind bogglingly complex models out there. Now, let us move to the other end of the spectrum to the most accurate model we can.

Important Note: Most of the explanation for BaseRuns comes from Brandon Heipp's site, here. I have only tried to re-word some of the stuff in the hopes of providing the simplest explanation possible that covers what's useful to know. All credit for originality belongs to Smyth and Heipp.

Many people here are at least acoustically familiar with the term linear weights, but may not be aware of what linear weights actually means. Linear weights is, in a nutshell, a linear regression equation. There are certain inputs to the system (hits, home runs, etc) and each input is assigned its own importance (or weight) toward overall run scoring. With all apologies to Tim McCarver's ignorance, as far as run scoring goes, a walk is not, in fact, as good as a homerun. A homerun contributes more runs and so it deserves more importance in a run estimating model.

Linear weights formulas apply a static run value to each event. To borrow Heipp's example, many linear weight systems say that a single walk is worth about 0.33 runs. Over the course of a season, this is a good estimate of the value, in runs, of a walk. The problem is, in a game in which a team draws one walk and makes 27 outs, the walk will not have the same value. Most systems apply a value of about -0.1 runs for every out and so the system's prediction for the number of runs scored in this game would be ~-2.4 runs (+0.33 - 2.7). This clearly is incorrect.

This is an example of the shortcomings of most linear weights systems; they were designed to work within a certain range. Now, granted, the range they were designed for corresponds to the range in which most major league teams perform, but baseball has shifted dramatically in just the last 25 years, and while it's nice to have an accurate measure of major league performance, we might also want a system that can be applied to minor league games, which can have vastly different environments.

BaseRuns (BsR) is not a linear weighting system, but rather a multiplicative weighting system. It was developed by David Smyth about 15 years ago and has undergone some minor tweaks since its inception, but the theory of BaseRuns is sound. Every plate appearance ends in one of three ways: the batter is out, the batter reaches base (note: fielder's choices fall into this category) or he hits a homerun. If the batter does reach base, there are three more potential outcomes: he will score, he will make an out (here's where the fielder's choice out comes in) or he will be stranded. With that established, we can write an equation for the number of runs scored as such:


Runs Scored = (#Baserunners * % of baserunners that score) + Homeruns

Now, David Smyth broke up his model into four parts: A, B, C and D. A is just the number of baserunners. B is what's called the "advance factor" and is used to describe how important certain events are toward advancing runners. C is the number of outs and D is the number of homeruns. Equatically:

A = H + BB + HBP - HR - .5*IBB
B = [1.4*TB -.6*H -3*HR +.1*(BB+HBP-IBB) +.9*(SB-CS-GDP)] * 1.1
C = AB - H + CS + GDP
D = HR

BaseRuns = A*(B/(B + C)) + D

Note: IBB are cut in half because while normal walks come in randomly distributed situations, intentional walks are usually issued only in favorable situations, that is, where the run value of adding that batter is much lower than normal.

The benefit of BaseRuns(BsR) over other models is that the integrity of the model holds up even in extreme environments. Again borrowing from Heipp, it should be noticed that when a solo homerun is hit, BsR is alone among models to correctly predict that a single run be scored. Most linear weights will predict 1.4 runs from the same situation. This is because most of those models are built to survive only in the major leagues, while BsR has the unique ability to adapt to any league, even little league!

The great thing is that this adaptability does not come at the price when modeling the actual major leagues, which is what we do care about the most. Running comparisons between various models and actual run scoring outputs, BsR routinely is among the models with the highest degrees of correlation and with the lowest error rates.

To finish it off, here's a chart of 2007 totals for BaseRuns and the Pythag records based off them.

TEAM         Pyth W/L       RS-RA

AL   EAST
Boston       102-60       900-681
Yankees       97-65       965-791
Toronto       88-74       757-689
Orioles       76-86       772-824
Tampa Bay     71-91       818-924

AL   CENTRAL
Cleveland     92-70       823-717
Detroit       88-74       881-807
Minnesota     76-86       709-754
White Sox     70-92       701-809
KC Royals     67-95       677-816

AL   WEST
Oakland       85-77       777-735
Anaheim       85-77       781-745
Seattle       78-84       783-814
Texas         75-87       779-837

NL   EAST
NY Mets       88-74       817-748
Atlanta       88-74       800-732
Phillies   &#8202  87-75       906-839
Marlins       75-87       821-889
Nationals     68-94       672-797

NL   CENTRAL
Chi Cubs   &#8202  86-76       759-707
Milwaukee     85-77       807-770
Cincinnati  &#8202 76-86       789-846
St. Louis     73-89       711-789
Houston       70-92       732-846
Pittsburgh  &#8202 68-94       711-838

NL   WEST
Colorado   &#8202  91-71       848-749
San Diego   &#8202 87-75       723-663
LA Dodgers  &#8202 85-77       741-698
Arizona       77-85       709-750
SF Giants     74-88       666-733