Trying to Track True Team Talent

The right way to project teams in 2008 is to build those teams up from scratch. The single biggest flaw in projection is using past results as a baseline. The best thing you can do when trying to think about the 2008 season is to ignore the team (not player) results from 2007. The Angels won 94 games last year? 100% irrelevant. They scored 822 runs? 100% irrelevant. The problem with using team totals from 2007 is that it assumes that 2007 represents a true talent level, and it doesn't, not even close.

However, this is hard to do. As people, we crave numbers, it's just how our brains are wired and we'll subconsciously give credence to the first numbers we're exposed to. Take any debate class, talk to an experienced salesman or read a negotiations book and you'll butt up against this time after time. It's one of the most powerful human urges and even people that are aware of it cannot be fully free of the impulse to give weight to the first number they hear/see. These are known as anchors and because of them, you'll face a lot of resistance when trying to project 2008 teams while ignoring 2007 results.

"The Ms won 88 games last year and added Silva and Bedard. That's like 10 extra wins right there so they should be a 98 win team easy."

Now, as I've said, it's nearly impossible to completely ignore last year's win total. But one thing we can try to do in order to ease the pressure of comparison, and also because it makes for an interesting exercise, is to "correct" the 2007 win totals as best we can towards the actual true talent level of the team. That's what we'll look at here, broken down by steps.

Here's where most people start and finish. Now, most of know that actual win-loss record tells us something about the team's inherent quality, but that there are better measures. It's akin to ERA for evaluating pitchers. It's better than nothing, but there are much better metrics available. For teams, that means ignoring the actual wins and losses and focusing instead on runs scored and allowed.

Pythag record attempts to find the expected won-loss record of a team based on how many runs they score and allow.

Pythag Record = (RS^2/(RS^2+RA^2))x162
RS = runs scored, RA = runs allowed
Note: This is the basic formula. There is a more accurate version that replaces the exponent (2 in this case) with the average number of runs (both teams combined) scored per game (RPG) raised to .29. So if RPG was 9.8 the exponent would be 9.8^.29, or 1.94.

Knowing a team's pythag record is incredibly useful in-season, but not that useful once it's over. Studies have shown that teams are much more likely to regress towards their pythag record over the rest of the season than to continue playing at whatever their current winning percentage stands at. For example, if 81 games into a season Team X has an actual record of 35-46 (.432 W%) and a pythag record of 41-40 (.506 W%), then that team is more likely to win 50.6% of their next 81 games (resulting in a year end W% of .469) than 43.2%.

Important Note: This is the definition of regression. That the team starts playing at their "true level" the rest of the season. We DO NOT expect the team to win 47 of their next 81 games in order to finish the season with a .506 W%. In other words, if you flip a fair coin five time in a row and get all tails, you still expect the head% to be 50% for your flips going forward. You don't expect to then see 5 heads to balance out the totals.

However, the turnover and aging inherent in moving from one season to another leads pythag record to be next to useless in projecting forward. It's better than the actual win-loss record, but still not very good. We do like keeping track of expected performance in terms of runs, but just as we have a problem in step 1 with the actual won-loss record instead of expected won-loss record, we have a problem with pythag because it is based on actual runs scored and allowed instead of expected runs scored and allowed.

We looked at BaseRuns previously so I would suggest reviewing that if you have questions over why it works well as a run estimator. That's what we want to do here; to take the actual game batter-pitcher outcomes (e.g. triples) and use them to come up with expected runs scored and allowed. This allows us to strip out some luck and also happens to mitigate some of the "blowout effect" that is so often a critique of Pythag record. However, we still have another step to take, because though we've solved the actual runs scored and allowed problem, we did so by relying on the actual batter-pitcher outcomes, which means we're still subject to some luck factors. Is there a way to correct for some of that? Yes.

tRA has been well-explained by Graham so go check that out, or ask him directly if you have any questions, but just to offer a quick summary, tRA attempts to quantify the aspects within the pitcher's control (e.g. Ks and BBs) which we've done before in other metrics (e.g. FIP), but also to assign run values to batted ball types (based on league averages) so that instead of relying on the actual number of doubles, triples and home runs that a pitcher allows, we can get an expected run value of those outcomes based on a pitcher's GB/LD/FB/IF profile.

Now, this isn't entirely 100% robust because not every groundball is the same, but on the run prevention side, that's not a huge issue because a large enough sample size means we can get away with assuming a normal distribution. If we were looking at the offensive side, it would be a legitimate issue. Ichiro's groundballs are not the same as Richie Sexson's groundballs. If you want an example of how this causes problems, look at PrOPS or PECOTA.

Nevertheless, back to tRA, it is park and defense neutral which is great for allowing us to look at how pitchers fared by themselves, but for this exercise we're more concerned with how the team did as a whole unit in terms of run prevention, so we need to add in the expected contributions from the park and defense.

The park part is easy, just pick your favorite park factor. I survey BR, BP and Heipp's site to try and get a consensus rating. Defense is much much tougher and frankly the only thing we can do  at the moment to get an expected number is to take last year's actual total (I use THT's Plus/Minus here) and regress it by some factor towards the league mean. The factor I'm going to use for now is 50%. This is the shakiest part of the whole process and the reason that I present the results without it in the examples below. I welcome comments on how better to account for expected defense.

In order to neutralize the year, here's the desired process:

-Use tRA, park factors, and regressed team defense to estimate expected runs allowed
-Use BaseRuns to estimate expected runs scored
-Plug those expected values into Pythag to end up with expected wins and losses

and you end up with a reasonable estimate for a team's true talent level for that year.

Let's use some concrete exmples to help illustrate the process. We'll look at two teams of interest: the 2007 Angels and 2007 Mariners.

The Angels finished 2007 with a record of 94-68.
The Mariners finished 2007 with a record of 88-74.
The Angels scored 822 runs and allowed 731 for a pythag record of 90-72.
The Mariners scored 794 runs and allowed 813 for a pythag record of 79-83.
According to BaseRuns, the Angels should have scored 781 runs and allowed 745, a pythag record of 85-77.
According to BaseRuns, the Mariners should have scored 783 runs and allowed 814, a pythag record of 78-84.

According to tRA, the Angel pitchers should have allowed 700 runs (park + defense nuetral).
According to tRA, the Mariner pitchers should have allowed 749 runs (park + defense nuetral).
According to Park Factors, the Angels' home park is nuetral.
According to Park Factors, the Mariners' home park supresses runs by 4%.
According to THT, the Angel defense cost the team 39.2 runs so that's 20 runs regressed.
According to THT, the Mariner defense cost the team 51.2 runs so that's 26 runs regressed.
Angels: 700 x 1.00 + 20 = 720 runs allowed (739 if you leave defense alone).
Mariners: 749 x 0.98 + 26 = 760 runs allowed (785 is you leave defense alone).

Angels regressed run profile, 781 RS 739 RA, pythag = 85.5 wins.
Angels regressed run+def profile, 781 RS 720 RA, pythag = 87.6 wins.
Mariners regressed run profile, 783 RS 785 RA, pythag = 80.8 wins.
Mariners regressed run+def profile, 783 RS 760 RA, pythag = 83.4 wins.

How does that help for projection? It doesn't and I want to be extra clear on that. All this intends to do is come up with a figure for runs scored and runs allowed if we were to re-play the 2007 season (knowing what we know now, i.e. playing time) a million times. You shouldn't use this as a basis for projecting 2008.

It's interesting (at least to me) and it's helpful insofar as to help dissuade the notion that the Mariners were really an 88 win team and the Angels were really a 94 win team in 2007 and it's a step above just looking at their pythag record which would leave you with an amplified idea of the difference between the two teams.

Anything else you use it for is not recommended and possible side effects include: angry bees, being stabbed, looking like an idiot, being ridiculed and general douchebaggery. If your urge to misuse these numbers lasts longer than four hours please consult a local mine shaft.