I've decided to start writing a sabremetric 'course' akin to what Fett42 gave us in the FanPosts but easier to link to. My hope is that eventually every major concept gets covered and we can autotag this into our posts whenever it comes up. The goal is to be able to give people who haven't encountered this before a quick reference tool where they can look up any unfamiliar concept. With that in mind, I should start from what I think is the top: Game State.
Prerequisites for understanding: None.
Prerequisites for derivation: Database.
Baseball is without a doubt the easiest North American sport to analyse. A major reason that this is the case is that there are discrete states for a game. There's the score and inning, obviously, but there's also outs and baserunners to take into account. With those four pieces of information, you can describe any baseball game at any time. We call the combination of score, inning, baserunners, and outs the game state. The game state matrix is typically considered to reflect the number of runs that an average MLB team will score in an inning given any combination of baserunners and outs (run expectancy), but a similar concept can be applied to a team's chances of winning any given game (win expectancy).
Why is this important? Every game state has been experienced in the major leagues thousands of times, leaving analysts with the ability to determine the odds of a team winning a game at any given point, or even how many runs might be expected to score in an inning. If you have the average number of runs expected to score in an inning after any game state, you can figure out how many runs a stolen base is worth, or a triple, or a strikeout. The game state essentially allows us to relate everything that happens on the diamond back to the major currencies of baseball: winning and runs. Without it, there would be no apples to apples comparison between pitching and hitting, walks and doubles, you name it. The game state is the key concept behind linear weights, and therefore understanding what it means is vital to achieving a good grasp on how most modern statistics work.
There are a few major points to keep in mind:
- Game state matrices (runs or winning) are derived empirically from multiple seasons of data.
- All players and teams are assumed to be average. According to the matrix, at 0-0, nobody out and nobody on, the worst team in the game will have a 50% chance of beating the best team.
- This provides a baseline with which players and teams can be evaluated. Outperforming the game state means a team is above average, underperforming means the opposite.
- The transition between game states can yield average run and out values of any given event (linear weights). This is, strictly speaking, a link of a Markov chain.
To determine how many runs score from a specific baserunner/outs situation, one simply needs to find the number of times that situation occured and then tally the total runs that scored between that instance and the close of the inning. Win expectancy is derived in much the same manner. The only difficult part of it is data gathering and knowing which seasons to look at, as different run environments naturally yield different results. Ensure that the game state in use is appropriate for the run environment - don't use the 1970s to model happenings in the late 1990s, for example.
Tom Tango has derived this matrix for run expectancy from 1999-2002.
WPA, Linear Weights.