Navigation: Jump to content areas:


Pro Quality. Fan Perspective.
Login-facebook
Around SBN: The Most Dangerous Division in Sports

The BaseRuns Model

Last time I examined some of the issues with OPS and why, although it's unsurpassed as a quick and dirty measure of offense, there is room for improvement if you want to allow more complexity. GPA is one such method; adding a bit of complexity in exchange for a bit more accuracy, but GPA is but a stepping stone between the beautiful simplicity of OPS and the more accurate but mind bogglingly complex models out there. Now, let us move to the other end of the spectrum to the most accurate model we can.

Important Note: Most of the explanation for BaseRuns comes from Brandon Heipp's site, here. I have only tried to re-word some of the stuff in the hopes of providing the simplest explanation possible that covers what's useful to know. All credit for originality belongs to Smyth and Heipp.

Many people here are at least acoustically familiar with the term linear weights, but may not be aware of what linear weights actually means. Linear weights is, in a nutshell, a linear regression equation. There are certain inputs to the system (hits, home runs, etc) and each input is assigned its own importance (or weight) toward overall run scoring. With all apologies to Tim McCarver's ignorance, as far as run scoring goes, a walk is not, in fact, as good as a homerun. A homerun contributes more runs and so it deserves more importance in a run estimating model.

Linear weights formulas apply a static run value to each event. To borrow Heipp's example, many linear weight systems say that a single walk is worth about 0.33 runs. Over the course of a season, this is a good estimate of the value, in runs, of a walk. The problem is, in a game in which a team draws one walk and makes 27 outs, the walk will not have the same value. Most systems apply a value of about -0.1 runs for every out and so the system's prediction for the number of runs scored in this game would be ~-2.4 runs (+0.33 - 2.7). This clearly is incorrect.

This is an example of the shortcomings of most linear weights systems; they were designed to work within a certain range. Now, granted, the range they were designed for corresponds to the range in which most major league teams perform, but baseball has shifted dramatically in just the last 25 years, and while it's nice to have an accurate measure of major league performance, we might also want a system that can be applied to minor league games, which can have vastly different environments.

Star-divide

BaseRuns (BsR) is not a linear weighting system, but rather a multiplicative weighting system. It was developed by David Smyth about 15 years ago and has undergone some minor tweaks since its inception, but the theory of BaseRuns is sound. Every plate appearance ends in one of three ways: the batter is out, the batter reaches base (note: fielder's choices fall into this category) or he hits a homerun. If the batter does reach base, there are three more potential outcomes: he will score, he will make an out (here's where the fielder's choice out comes in) or he will be stranded. With that established, we can write an equation for the number of runs scored as such:


Runs Scored = (#Baserunners * % of baserunners that score) + Homeruns

Now, David Smyth broke up his model into four parts: A, B, C and D. A is just the number of baserunners. B is what's called the "advance factor" and is used to describe how important certain events are toward advancing runners. C is the number of outs and D is the number of homeruns. Equatically:

A = H + BB + HBP - HR - .5*IBB
B = [1.4*TB -.6*H -3*HR +.1*(BB+HBP-IBB) +.9*(SB-CS-GDP)] * 1.1
C = AB - H + CS + GDP
D = HR

BaseRuns = A*(B/(B + C)) + D

Note: IBB are cut in half because while normal walks come in randomly distributed situations, intentional walks are usually issued only in favorable situations, that is, where the run value of adding that batter is much lower than normal.

The benefit of BaseRuns(BsR) over other models is that the integrity of the model holds up even in extreme environments. Again borrowing from Heipp, it should be noticed that when a solo homerun is hit, BsR is alone among models to correctly predict that a single run be scored. Most linear weights will predict 1.4 runs from the same situation. This is because most of those models are built to survive only in the major leagues, while BsR has the unique ability to adapt to any league, even little league!

The great thing is that this adaptability does not come at the price when modeling the actual major leagues, which is what we do care about the most. Running comparisons between various models and actual run scoring outputs, BsR routinely is among the models with the highest degrees of correlation and with the lowest error rates.

To finish it off, here's a chart of 2007 totals for BaseRuns and the Pythag records based off them.

TEAM         Pyth W/L       RS-RA

AL   EAST
Boston       102-60       900-681
Yankees       97-65       965-791
Toronto       88-74       757-689
Orioles       76-86       772-824
Tampa Bay     71-91       818-924

AL   CENTRAL
Cleveland     92-70       823-717
Detroit       88-74       881-807
Minnesota     76-86       709-754
White Sox     70-92       701-809
KC Royals     67-95       677-816

AL   WEST
Oakland       85-77       777-735
Anaheim       85-77       781-745
Seattle       78-84       783-814
Texas         75-87       779-837

NL   EAST
NY Mets       88-74       817-748
Atlanta       88-74       800-732
Phillies   &#8202  87-75       906-839
Marlins       75-87       821-889
Nationals     68-94       672-797

NL   CENTRAL
Chi Cubs   &#8202  86-76       759-707
Milwaukee     85-77       807-770
Cincinnati  &#8202 76-86       789-846
St. Louis     73-89       711-789
Houston       70-92       732-846
Pittsburgh  &#8202 68-94       711-838

NL   WEST
Colorado   &#8202  91-71       848-749
San Diego   &#8202 87-75       723-663
LA Dodgers  &#8202 85-77       741-698
Arizona       77-85       709-750
SF Giants     74-88       666-733

Comment 7 comments  |  0 recs  | 

Do you like this story?

Comments

Display:

A fascinating linear weights model.
The only criticism I've heard is that it isn't great for determining individual contributions, but is pretty good at solving the problems associated with the Bill James model of RC.
I will not make jokes in my sig. I will not make jokes in my sig. I will not...

by TIF @ Lookout Landing on Mar 1, 2008 5:01 PM PST reply actions  

Yeah, it's not great for individual players
but it can work, it just requires a lot of calculations. But, if you're using BsR in the first place, you're probably not doing the calculations by hand anyways so...

by Matthew on Mar 2, 2008 11:52 AM PST up reply actions  

Why doesn't it do a good job with individuals
if it can do a good job of predicting a team production.  I know its a decent sized assumption but why does it break down for individuals a little.

by Edgar for Pres on Mar 2, 2008 5:17 PM PST up reply actions  

Do what I said above
Take a player.  Take a league average teams line, replace a league average players production with the players expected line, calculate Base Runs with both sets of numbers.  Subtract the team with the player from the league average one, and you have runs above average.  Or, take a Safeco average players lines, subtract a league average Safeco players line, add Raul's Marcels in, do base runs for both, repeat, apply your park factors, and you have his park adjusted value to the team in Runs above average.  I think linear weights work well enough at a player season level to be fine, but if you want, Base Runs (or a Markov process, like Tangos perfect Run modeller) work fine.

by chrisisasavage on Mar 2, 2008 6:04 PM PST up reply actions  

Basically because run scoring
is a team activity, not a solo one. Barry Bonds cannot draw a walk and then drive himself in; he needs a teammate to do that.

You can soundly apply BsR to individual pitchers and to entire teams. For individual hitters, you should use a linear system. FWIW, there is a linear weighted BsR specially designed for that.

by Matthew on Mar 2, 2008 7:04 PM PST up reply actions  

I like taking
an expected batting line for a player, and replacing an average player on an average team with it, using BsR to get Runs above average.  Linear weights work just fine though an are a much more simple way to find out a players relative value (to average) to a team.

by chrisisasavage on Mar 1, 2008 5:03 PM PST reply actions  

Davenport's EQA article
on EQAhas a comparison of EQA, BaseRuns over various periords.
visiting A's fan.

by rfloh @ Lookout Landing on Mar 1, 2008 11:59 PM PST reply actions  

Comments For This Post Are Closed


User Tools

By reading a game thread of your own volition you agree to accept all liability for any and all damage done to your delicate sensibilities.

FanPosts

Community blog posts and discussion.

Recent FanPosts

Small
Starlin Castro's fit with Seattle
Kawasaki80_small
Lists! So many lists!
M_s_hat_copy_small
OT -- May 22nd In Memoriam
Ichiro_small
Why do managers and media members hate walks?
Wbc_029_small
Friday Morning Music Thread
Small
Dustin Ackley BP swing vs game swing
Beastquakerwallpaper_small
More on the Struggles of Smoak
Randy2_for_sbn_small
Albert Pujols 2012: Three Retrospectives
Small
On Batting Orders
Niehaus_small
More on Dustin Ackley and the strikezone

+ New FanPost All FanPosts >

Yahoo_full_count

Sexy People

Wbc_029_small Jeff Sullivan

Small Matthew

Claw_small JY