I feel like Jeff should be writing this one, but hey, it's my series
Prerequisites for derivation: Game state; data.
What is WPA?
We've talked at length already about converting events on the field to runs, and then runs into wins. But what if we don't need to bother with the intermediate conversion? Instead of run expectancy, why don't we just use win expectancy instead? The point, after all, is to determine how many wins a player was worth. We've explored linear weights, and we can use a very similar concept to derive the change in wins associated with a certain change in game state. We call this win probability added (WPA).
Historically, for any given game state, we have figures telling us how often the home team won and how often the away team did. In the simplest incarnation of win probability, we simply borrow these average numbers and apply them to a given game. Down by three in the bottom of the third? Don't fret - your team will still win 20% of the time (I made that up). Up by three in the ninth? Your position is more or less secure. And how much was the single that scored two runs to tie the game worth? More than the grand slam after it became a blowout? Less? WPA gives you the answers to all of these things. Now, due to sample size concerns for the really weird games, historical win expectancy data isn't great. After all, it's pretty rare to have the bases loaded while down by eight runs in the bottom of the first, so our data can be badly skewed if, say, one team in four had managed to come back in such a situation. In order to account for these oddities, we can build a theoretical WPA model along the same lines as the historical data. This is the source of the values we see today.
Applications and Difficulties; Leverage Index
You can use win probability added for a number of things. For instance, when is it useful to sacrifice bunt? If, say, bunts work 70% of the time, fail 25% of the time, and result in an error 5% of the time, and we know the different win expectancies associated with these events we can determine whether a sacrifice is really worth attempting or not. A similar thought process applies to stolen bases. WPA is a fun measure to play with when you want to look at how much a pitcher or hitter has actually contributed to past wins and losses (the average player, of course, will add no win probability over the course of the season) and indeed can provide a useful measure of 'clutchness'. In fact, we can use WPA to see how critical any situation in a game might be: times with the highest possible swing in WPA are high leverage situations, and times where an at-bat is irrelevant are low leverage. We measure this with 'Leverage Index' (LI). The beauty of WPA and leverage is our ability to quantify how tense moments are, when they really matter.
This also means that apart from some very very specific circumstances (generally related to bullpen usage), it's a toy rather than a useful analysis tool. WPA is context-dependent, meaning that teammates contributions cannot be easily separated from one another. It makes isolating pitchers from their defences even more difficult. And damningly, it's less stable than the comparable statistics that aren't based on win expectancy. This means that we're introducing huge elements of luck when we look at WPA. Not to say that it's not a very fun toy - it's great to look back at key moments of the game and see where momentum and emotions were building, cresting, then dropping off. It's also a lot of fun to see who contributed most over the season. But be careful never to buy into the idea that WPA is more predictive than our advanced statistics derived by different paths: they have solved the isolation problem; WPA has not.