I've been promising everyone a full tRA writeup for some time now, so here it is. Enjoy.
We don't really have a widely available coherent metric for pitchers which tells us how good a pitcher is, independent of his home park and the defence behind (and if anyone feels tempted to say 'ERA' here, read Dave Cameron's article on pitcher evaluation first). FIP and xFIP are the most commonly used general pitching stats we have, and they're not really good enough, as they only look at 3 possible outcomes of an at-bat: K, BB, HR.
There is therefore a distinct motivation for the construction of a metric which takes into account every action a pitcher is responsible for, and turns those numbers into runs, based around a highly logical and transparent mathematical framework.
Theory and Method
There are essentially only eight possibilities for the state of a baseball at the instant the contest between batter and pitcher has resolved itself. The batter may walk, strike out, or be hit by a pitch. He may also hit a line drive, a ground ball, an outfield fly, a popup, or a home run. Others, such as bunts and intentional walks, are essentially subsets of the more important outcomes. These possibilities can be regarded as being governed by the pitcher, provided that there is a large enough sample size.
tRA is built around knowing how many runs and outs each of these events are worth, and ideally we want to know this for each year so as to account for different run-scoring environments. Before we delve into the metric itself, let's see how we might accomplish this, with a little help from Matthew the Data Fairy and his magic.
Using play by play data we can, in any given year, determine the average number of outs that was made on a given type of play by simply going through logs counting outs and dividing. Fairly straightforward, although a small correction factor has to be introduced to deal with outs made on the bases. An example table for 2008 is shown below:
Runs are slightly more tricky. We have to introduce a run expectancy matrix in order to work out how many runs -should- score from any given game situation. (bases empty, no out, etc.). In general, they look something like this, but it's certainly possible to build your own based again on play-by-play data. When the matrix is derived, we can work out the difference in runs on any given play by looking at the following:
play_run_value = runs_scored + (run_expectancy_after - run_expectancy_before)
With a little effort/fairy dust, the yearly average value of each type of play can be determined. So far for 2008 it looks a little like this:
If these are combined with the frequency with which a pitcher gives up each outcome (after making some park adjustments based on these numbers [spreadsheet]) and multiplied by total batters faced (TBF), we can determine how many runs that pitcher would have given up in a neutral park in front of an average defence. From the outs table shown earlier we can also figure out how many outs/innings he would have been expected to pitch through. tRA can then be determined as follows:
tRA = expected_runs/expected_outs*27
which gives us the expected runs a pitcher will give up per 9 innings pitched.
tROA measures how many runs are saved by a given pitcher compared to an average pitcher from the same grouping (e.g. NL SP, etc.). This metric ignores the work done on determining how many outs a pitcher should be expected to record, but it does allow for measuring pitchers in terms of their overall run value (which can then be converted to wins). tRA+ is simply pitcher_tRA/league_tRA, facilitating a quick evaluation of a pitcher compared to league average.
Another point worth considering is regression towards average. Certain pitching stats are known to fluctuate quite wildly from year to year, and in order to correct for this every outcome is regressed towards the mean based on their year-by-year correlation values and the total batters that a pitcher has faced on the season, with less regression applied the larger the sample size. The actual values to which regression is applied are as follows:
K%, BB%, HBP%, GB per ball in play%, IFF per ball in air%, LD per ball in air%, and HR per FB%
The order is extremely important, as influencing GB% will have an effect on LD% later, and so on, sometimes causing regression away from the mean in unusual situations.
Once a pitcher's line has been regressed, the same algorithms used to generate tRA and tROA are applied again to give tRA* and tROA*.
Unfortunately I can't give you guys any spreadsheets to play around with this time, but Matthew and I (ok, mostly Matthew) are working on making everything accesible online. We'll be sure to let you know when it's ready.
Hopefully we can appease you with some 2008 leaderboards and old 2007 player cards, though...
If there are any questions, that's what the comments section is for. Oh, and many thanks to Matthew for his work on this too.