Navigation: Jump to content areas:


Pro Quality. Fan Perspective.
Login-facebook
Around SBN: Jerry Sandusky's Wife Tries To Run A Reporter Over

Sabermetrics 101: The Game State, Run Expectancy, and Win Expectancy

I've decided to start writing a sabremetric 'course' akin to what Fett42 gave us in the FanPosts but easier to link to. My hope is that eventually every major concept gets covered and we can autotag this into our posts whenever it comes up. The goal is to be able to give people who haven't encountered this before a quick reference tool where they can look up any unfamiliar concept. With that in mind, I should start from what I think is the top: Game State.

Prerequisites for understanding: None.

Prerequisites for derivation: Database.

Star-divide

The What

Baseball is without a doubt the easiest North American sport to analyse. A major reason that this is the case is that there are discrete states for a game. There's the score and inning, obviously, but there's also outs and baserunners to take into account. With those four pieces of information, you can describe any baseball game at any time. We call the combination of score, inning, baserunners, and outs the game state. The game state matrix is typically considered to reflect the number of runs that an average MLB team will score in an inning given any combination of baserunners and outs (run expectancy), but a similar concept can be applied to a team's chances of winning any given game (win expectancy).

The Why

Why is this important? Every game state has been experienced in the major leagues thousands of times, leaving analysts with the ability to determine the odds of a team winning a game at any given point, or even how many runs might be expected to score in an inning. If you have the average number of runs expected to score in an inning after any game state, you can figure out how many runs a stolen base is worth, or a triple, or a strikeout. The game state essentially allows us to relate everything that happens on the diamond back to the major currencies of baseball: winning and runs. Without it, there would be no apples to apples comparison between pitching and hitting, walks and doubles, you name it. The game state is the key concept behind linear weights, and therefore understanding what it means is vital to achieving a good grasp on how most modern statistics work.

There are a few major points to keep in mind:

  • Game state matrices (runs or winning) are derived empirically from multiple seasons of data.
  • All players and teams are assumed to be average. According to the matrix, at 0-0, nobody out and nobody on, the worst team in the game will have a 50% chance of beating the best team.
  • This provides a baseline with which players and teams can be evaluated. Outperforming the game state means a team is above average, underperforming means the opposite.
  • The transition between game states can yield average run and out values of any given event (linear weights). This is, strictly speaking, a link of a Markov chain.

The How

To determine how many runs score from a specific baserunner/outs situation, one simply needs to find the number of times that situation occured and then tally the total runs that scored between that instance and the close of the inning. Win expectancy is derived in much the same manner. The only difficult part of it is data gathering and knowing which seasons to look at, as different run environments naturally yield different results. Ensure that the game state in use is appropriate for the run environment - don't use the 1970s to model happenings in the late 1990s, for example.

Example

Tom Tango has derived this matrix for run expectancy from 1999-2002.

What Follows

WPA, Linear Weights.

Comment 38 comments  |  17 recs  | 

Do you like this story?

Comments

Display:

When you say Prequesite for derivation: database ...

do you mean a play-by-play database of all games?

And just out of curiosity, is this database (whatever it is) freely available?

What're ya gonna do with those pies, boys?

by rickpo on Feb 14, 2010 11:47 PM PST reply actions  

Retrosheet is freely availabe

And it contains play by play data for most years since around 1955. You can get all of that data by following the steps here:

http://www.hardballtimes.com/main/blog_article/building-a-retrosheet-database-the-short-form/

Or, if you don’t want to go through that hassle, just import some of these SQL ZIP files:

http://www.wantlinux.net/2009/04/retrosheet-baseball-mysql-database-download/

by vivaelpujols on Feb 15, 2010 12:05 AM PST up reply actions  

Do you foresee a time in the not too distant future where differences in skill level will be integrated to Win Probability?

My thought is no, for the following reasons:

- Good teams will have ~60% win percentage, bad teams ~40% with some variation, which is close enough to 50/50
- True talent can never be truly measured. It is always an estimate based on outcomes, age, and probability.
- It would be too difficult and a waste of energy to add those adjustments for each game state, especially since they would be inexact.

But I would like to hear your thoughts. And if you do expect it to be added, why?

...and now I'm here

by CapSea on Feb 14, 2010 11:52 PM PST reply actions  

There's no real reason to

Doing such a thing would contextualise WPA and make it useless for player valuation (as opposed to useless for measuring talent level, which it currently is). The only reason I can see to implement such a system is for in-game gambling, which seems like a pretty small niche market to me.

by Graham MacAree on Feb 15, 2010 7:12 AM PST up reply actions  

Could be a niche market, but would be a niche market with a huge amount of disposable income

if you could make something like that, no serious gambler could afford to be without it when betting in-game

by seattlebruin on Feb 15, 2010 11:29 AM PST up reply actions  

I've done this a few times.

Bodog.com

This is probably a little different then you guys are talking about though, this is literally betting on the outcome of every pitch.

by hcoguy on Feb 15, 2010 12:13 PM PST up reply actions  

Great idea Graham.

I think this will help everyone understand the smaller aspects that go in to stats, like linear weights and such. You’ve been doing some great work on the site lately too.

by Kirk on Feb 15, 2010 1:13 AM PST reply actions  

What is a good range of years for the data to take?

I mean, if I were evaluating a player for 2010, should I take the past 10 years? 7? Is there any standard dataset size when talking about this stuff?

by 88fingerslukee on Feb 15, 2010 7:09 AM PST reply actions  

For run expectancy you only need a few years or even one year of data, because the run/out states happen so much

For win expectancy, you need more data, which is why current win expectancy charts are moving away from empirical data towards more theoretical measures. I’ll explain that in more detail in a later post.

by Graham MacAree on Feb 15, 2010 7:14 AM PST up reply actions  

Is there much variance on run expectancy?

Seeing that the data was from ‘99-’02 and is still being used, I am guessing that the variance is minimal but am curious to know what the range is.

by ToddK on Feb 15, 2010 9:37 AM PST reply actions  

I really like WPA and WE stuff

Its the thing that first got me interested in Lookoutlanding.

I think one of its biggest shortfalls when the time comes to use it is that it assumes everything is average. It doesn’t know you have the top of the order coming up in the bottom of the ninth or that Mariano Rivera is probably the best closer ever. It is blissfully ignorant to all of this. I don’t mind that it assumes everything is average. It is what it is. I still like it a lot but it makes it harder to confidently use at times.

by Edgar for Pres on Feb 15, 2010 5:30 PM PST reply actions  

Awesome

Thanks, Graham. I’ve been wishing that I had a better understanding of the terms you guys throw around.

#52 #10 #25 #7

by Cablinasian on Feb 15, 2010 5:38 PM PST reply actions  

Comments For This Post Are Closed


User Tools

By reading a game thread of your own volition you agree to accept all liability for any and all damage done to your delicate sensibilities.

FanPosts

Community blog posts and discussion.

Recommended FanPosts

Moar_bacon_small
Everything I Know About Jesus Montero

Recent FanPosts

Wbc_029_small
Friday Morning Music Thread
Small
OTDOD - Early February Edition
Agentejebaox3_small
A Statistical Analysis of Mariners' Fan Support
Small
Who will have a better season?
Claw_small
BA's Top 10 M's Prospects
Wbc_029_small
Friday Morning Music Thread
Small
Munenori Kawasaki Predictions!!!
Small
The Longevity and Future Success of Felix Hernandez.
Small
The present vs future conundrum

+ New FanPost All FanPosts >


Sexy People

Wbc_029_small Jeff Sullivan

Small Matthew