Lookout Landing: An SB Nation Community

Navigation: Jump to content areas:



Around SBN: Phillies trade for OF Matt Stairs Bar-right-arrows



Trying to Track True Team Talent

The right way to project teams in 2008 is to build those teams up from scratch. The single biggest flaw in projection is using past results as a baseline. The best thing you can do when trying to think about the 2008 season is to ignore the team (not player) results from 2007. The Angels won 94 games last year? 100% irrelevant. They scored 822 runs? 100% irrelevant. The problem with using team totals from 2007 is that it assumes that 2007 represents a true talent level, and it doesn't, not even close.

However, this is hard to do. As people, we crave numbers, it's just how our brains are wired and we'll subconsciously give credence to the first numbers we're exposed to. Take any debate class, talk to an experienced salesman or read a negotiations book and you'll butt up against this time after time. It's one of the most powerful human urges and even people that are aware of it cannot be fully free of the impulse to give weight to the first number they hear/see. These are known as anchors and because of them, you'll face a lot of resistance when trying to project 2008 teams while ignoring 2007 results.

"The Ms won 88 games last year and added Silva and Bedard. That's like 10 extra wins right there so they should be a 98 win team easy."

Now, as I've said, it's nearly impossible to completely ignore last year's win total. But one thing we can try to do in order to ease the pressure of comparison, and also because it makes for an interesting exercise, is to "correct" the 2007 win totals as best we can towards the actual true talent level of the team. That's what we'll look at here, broken down by steps.

Star-divide

ACTUAL WIN-LOSS RECORD
Here's where most people start and finish. Now, most of know that actual win-loss record tells us something about the team's inherent quality, but that there are better measures. It's akin to ERA for evaluating pitchers. It's better than nothing, but there are much better metrics available. For teams, that means ignoring the actual wins and losses and focusing instead on runs scored and allowed.

PYTHAG RECORD
Pythag record attempts to find the expected won-loss record of a team based on how many runs they score and allow.


Pythag Record = (RS^2/(RS^2+RA^2))x162
RS = runs scored, RA = runs allowed
Note: This is the basic formula. There is a more accurate version that replaces the exponent (2 in this case) with the average number of runs (both teams combined) scored per game (RPG) raised to .29. So if RPG was 9.8 the exponent would be 9.8^.29, or 1.94.

Knowing a team's pythag record is incredibly useful in-season, but not that useful once it's over. Studies have shown that teams are much more likely to regress towards their pythag record over the rest of the season than to continue playing at whatever their current winning percentage stands at. For example, if 81 games into a season Team X has an actual record of 35-46 (.432 W%) and a pythag record of 41-40 (.506 W%), then that team is more likely to win 50.6% of their next 81 games (resulting in a year end W% of .469) than 43.2%.


Important Note: This is the definition of regression. That the team starts playing at their "true level" the rest of the season. We DO NOT expect the team to win 47 of their next 81 games in order to finish the season with a .506 W%. In other words, if you flip a fair coin five time in a row and get all tails, you still expect the head% to be 50% for your flips going forward. You don't expect to then see 5 heads to balance out the totals.

However, the turnover and aging inherent in moving from one season to another leads pythag record to be next to useless in projecting forward. It's better than the actual win-loss record, but still not very good. We do like keeping track of expected performance in terms of runs, but just as we have a problem in step 1 with the actual won-loss record instead of expected won-loss record, we have a problem with pythag because it is based on actual runs scored and allowed instead of expected runs scored and allowed.

BASERUNS
We looked at BaseRuns previously so I would suggest reviewing that if you have questions over why it works well as a run estimator. That's what we want to do here; to take the actual game batter-pitcher outcomes (e.g. triples) and use them to come up with expected runs scored and allowed. This allows us to strip out some luck and also happens to mitigate some of the "blowout effect" that is so often a critique of Pythag record. However, we still have another step to take, because though we've solved the actual runs scored and allowed problem, we did so by relying on the actual batter-pitcher outcomes, which means we're still subject to some luck factors. Is there a way to correct for some of that? Yes.

tRA
tRA has been well-explained by Graham so go check that out, or ask him directly if you have any questions, but just to offer a quick summary, tRA attempts to quantify the aspects within the pitcher's control (e.g. Ks and BBs) which we've done before in other metrics (e.g. FIP), but also to assign run values to batted ball types (based on league averages) so that instead of relying on the actual number of doubles, triples and home runs that a pitcher allows, we can get an expected run value of those outcomes based on a pitcher's GB/LD/FB/IF profile.

Now, this isn't entirely 100% robust because not every groundball is the same, but on the run prevention side, that's not a huge issue because a large enough sample size means we can get away with assuming a normal distribution. If we were looking at the offensive side, it would be a legitimate issue. Ichiro's groundballs are not the same as Richie Sexson's groundballs. If you want an example of how this causes problems, look at PrOPS or PECOTA.

Nevertheless, back to tRA, it is park and defense neutral which is great for allowing us to look at how pitchers fared by themselves, but for this exercise we're more concerned with how the team did as a whole unit in terms of run prevention, so we need to add in the expected contributions from the park and defense.

The park part is easy, just pick your favorite park factor. I survey BR, BP and Heipp's site to try and get a consensus rating. Defense is much much tougher and frankly the only thing we can do  at the moment to get an expected number is to take last year's actual total (I use THT's Plus/Minus here) and regress it by some factor towards the league mean. The factor I'm going to use for now is 50%. This is the shakiest part of the whole process and the reason that I present the results without it in the examples below. I welcome comments on how better to account for expected defense.

WORKING BACKWARDS
In order to neutralize the year, here's the desired process:


-Use tRA, park factors, and regressed team defense to estimate expected runs allowed
-Use BaseRuns to estimate expected runs scored
-Plug those expected values into Pythag to end up with expected wins and losses

and you end up with a reasonable estimate for a team's true talent level for that year.

EXAMPLES
Let's use some concrete exmples to help illustrate the process. We'll look at two teams of interest: the 2007 Angels and 2007 Mariners.

The Angels finished 2007 with a record of 94-68.
The Mariners finished 2007 with a record of 88-74.
The Angels scored 822 runs and allowed 731 for a pythag record of 90-72.
The Mariners scored 794 runs and allowed 813 for a pythag record of 79-83.
According to BaseRuns, the Angels should have scored 781 runs and allowed 745, a pythag record of 85-77.
According to BaseRuns, the Mariners should have scored 783 runs and allowed 814, a pythag record of 78-84.

According to tRA, the Angel pitchers should have allowed 700 runs (park + defense nuetral).
According to tRA, the Mariner pitchers should have allowed 749 runs (park + defense nuetral).
According to Park Factors, the Angels' home park is nuetral.
According to Park Factors, the Mariners' home park supresses runs by 4%.
According to THT, the Angel defense cost the team 39.2 runs so that's 20 runs regressed.
According to THT, the Mariner defense cost the team 51.2 runs so that's 26 runs regressed.
Angels: 700 x 1.00 + 20 = 720 runs allowed (739 if you leave defense alone).
Mariners: 749 x 0.98 + 26 = 760 runs allowed (785 is you leave defense alone).

Angels regressed run profile, 781 RS 739 RA, pythag = 85.5 wins.
Angels regressed run+def profile, 781 RS 720 RA, pythag = 87.6 wins.
Mariners regressed run profile, 783 RS 785 RA, pythag = 80.8 wins.
Mariners regressed run+def profile, 783 RS 760 RA, pythag = 83.4 wins.

CONCLUSION
How does that help for projection? It doesn't and I want to be extra clear on that. All this intends to do is come up with a figure for runs scored and runs allowed if we were to re-play the 2007 season (knowing what we know now, i.e. playing time) a million times. You shouldn't use this as a basis for projecting 2008.

It's interesting (at least to me) and it's helpful insofar as to help dissuade the notion that the Mariners were really an 88 win team and the Angels were really a 94 win team in 2007 and it's a step above just looking at their pythag record which would leave you with an amplified idea of the difference between the two teams.

Anything else you use it for is not recommended and possible side effects include: angry bees, being stabbed, looking like an idiot, being ridiculed and general douchebaggery. If your urge to misuse these numbers lasts longer than four hours please consult a local mine shaft.

0 recs | Comment 22 comments

Story-email Email | Print |

Comments

Display:

That is much closer than I would have thought
The Angels/Mariners numbers, that is.

Do you publish these on THT?

...and now I'm here

by Librocrat on Mar 4, 2008 12:59 PM PST   0 recs

Nope
I make use of BaseRuns when we do that THT Dartboard though, but that's weighted against the team's actual record.

by Matthew on Mar 4, 2008 1:14 PM PST to parent up   0 recs

Good post.
As a longtime reader, I must say this has been a damn auspicious start of the season for Lookout Landing.

This might be the general douchebaggery you mentioned, but could we at least say that on the whole Seattle/Anaheim needed to only Improve/Decline a combined 5 games this past offseason to switch spots in the standings in 2008? I know it's messier than that, but if there's any truth to that statement it would make me a lot sunnier about the Ms season.

by John Morgan on Mar 4, 2008 2:22 PM PST   0 recs

Basically yes
The 2007 Ms and 2007 Angels were about 5 games apart in true talent level.

by Matthew on Mar 4, 2008 2:40 PM PST to parent up   0 recs

Preaching to the choir
But that's an outstanding post, Matthew.  The timing is really well done, too, as USSM had a link to the Baseball Analysts AL West Round Table discussion where it looked like too many of those analysts (Dave excluded) liked to use past performances to project future results.

It's a bad habit that's hard to break, but it is way too easy for people to do when they don't know how to project the future.  In that regard, this should be linked and preserved for future "required reading".

I will not make jokes in my sig. I will not make jokes in my sig. I will not...

by TIF on Mar 4, 2008 2:37 PM PST   0 recs

One comment
taking actual talent of players by  using 1B, 2B, HR, ect that they gave may also not be a totally accurate measure.  I'm sure if Sexson replayed last season, he would probably do better and I think his production was below his talent level.  Do I actually know a better way of finding what it is? No and I think using Baseruns is probably as good as we are going to get.  I'm happy somebody did this.  

I do wonder why you regressed the defense to the mean.  I don't know if there is a better way to do it.  Maybe you could just look up all the different metrics (UZR, ect) weight them somehow (by accuracy and trustworthiness) and then average it.  Still pretty sketchy but I think regressing everything to the mean isn't going to really going to give you a good idea of the actual talent of the defense.

by Edgar for Pres on Mar 4, 2008 4:08 PM PST   0 recs

because
Offense: Yeah, if there was a suitable tool for this, I would have used it. PrOPS comes closest, trying to account for bad luck when it comes to BABIP and the like, but the problem it has, as I mentioned, is that it treats all GB/LD/FB as the same so it introduces just as much error as it erases.

There's a big need for a system that can handle do this robustly.

Defense: Defense is highly variable so it has to be regressed in order to come up with an expected result. If the 2007 Mariners played 1,000 seasons, it's highly unlikely that they'd be that bad defensively on average.

by Matthew on Mar 4, 2008 4:20 PM PST to parent up   0 recs

For Defense
if you regress everybody to the mean nobody will be hurt.  If THT would have predicted the mariners were -5 runs on the season because they were really lucky and they were actually -15 runs on the season then regressing to the mean doesn't help.  I think taking as much data as you can and putting together your best guess and realizing there is some noise in our answer is probably the best we can do.  If all the systems say we are that bad, then we probably are.  If only one does then the other ones will help to average it out.

As long as I brought up noise, it might also be good to think about how close this method estimates the true talent of the team.  Is it within 5 runs, 10 runs, 25 runs?  I don't think we can easily figure it out but our gut intuition probably is decent at least.  I'd guess we are probably within around 10-20 runs of the correct number most of the time.  Uncertainty is starting to become one of the things I'm becoming more interested in but I haven't figured out a good way to go about it yet.

by Edgar for Pres on Mar 4, 2008 4:54 PM PST to parent up   0 recs

Wow
"I'd guess we are probably within around 10-20 runs of the correct number most of the time."

I would guess the uncertainty is much, much greater.    Maybe - MAYBE - 20runs in each componenet (Offense, pitching, defense), but it's gotta be much higher in the aggregate.    If you could reliably get within 1-2 wins each year, you'd go to Vegas and you'd become wealthy.

by marc w on Mar 4, 2008 5:22 PM PST to parent up   0 recs

Well i was saying
10-20 runs for each component and yeah probably closer to 20 runs.

by Edgar for Pres on Mar 4, 2008 6:14 PM PST to parent up   0 recs

This would be true
if we were looking for the actual results of the '07 defense. It's a whole 'nother beast to find expected results. For that, you have to regress heavily.

by Matthew on Mar 4, 2008 5:34 PM PST to parent up   0 recs

Good point about the actual results business
I don't like the significant regression but I guess we don't have too much else.  If you had to ask me, an OF with Ibanez/Guillen and an IF with Sexson (and Betancourt throwing all over the place) is gonna do pretty bad no matter how many times you replay last year but -26 runs is way more realistic than -51 runs so I guess in the end it looks fine.

by Edgar for Pres on Mar 4, 2008 6:22 PM PST to parent up   0 recs

did you err?
Fantastic post, Matthew, but something looks a bit off:

You said that Safeco suppresses runs by 4%, but in your tRA calculation, you use 0.98, implying a 2% effect.  

If you use 0.96 instead of 0.98, you get 745 runs allowed with defense factored in, or 769 runs without defense.

by Nadingo on Mar 5, 2008 8:10 AM PST   0 recs

Only half the games are played at home
Other half are in what's essentially a neutral environment

by Graham on Mar 5, 2008 8:19 AM PST to parent up   0 recs

Thanks.
I erred.

I erred big time.

by Nadingo on Mar 5, 2008 10:35 AM PST to parent up   0 recs

Thanks Matt, it really sunk in the 2nd time.
I'm not the biggest math guy (my eyes go blurry and I drool a little even though I understand the math), so I appreciated the lack of intricate math details in your write-up.

Is there a similar article on how you can use  reasonable true talent level for 2007 for 2008 projections?

Also, with all the regression being done, are there teams that have a true talent level of 95 wins?  

by Jed MC on Mar 6, 2008 11:58 AM PST   0 recs

Thanks.
Is there a similar article on how you can use  reasonable true talent level for 2007 for 2008 projections?

Not really, if you're working from (your best guess at) true talent level in 2007 and want to move toward a 2008 projection, you can get away with the whole +/- thing. It's just all about using those regressed values.

For example, let's use the 783-760 run profile for the Ms and build a quick '08 projection.

  1. Do we expect the true talent level of any returning members to dramatically change? We do think Ichiro and Vidro will lose some value, but that should be balanced by small increases from Sexson/Lopez. Remember, we're already dealing with regressed values, so we're only talking about dramatic shifts; usually injury-caused.
  2. Player changes.
-Weaver/Horacio/Baek/Fear (combined 232 expected runs allowed in '07) replaced by Silva/Bedard (combined 171 expected runs allowed in '07).
-Sherrill replaced by run of the mill BP arm (for us) is about 5 runs worse.
-Guillen replaced by Wilkerson.

We gain 61 runs from the rotation (assuming no collective change in talent level of Wash+Felix+Tits), lose 5 of that in the pen and then factor in whatever you think the offense+defense change from Guillen to Wilk will be.

Also, with all the regression being done, are there teams that have a true talent level of 95 wins?

Oh yes, Boston was well over, NYY ~97, Cleveland right at 95. There's a ton of teams in the high 80s though, pushing near 90.

by Matthew on Mar 6, 2008 1:42 PM PST to parent up   0 recs

Comments For This Post Are Closed


User Tools

By reading a game thread of your own volition you agree to accept all liability for any and all damage done to your delicate sensibilities.

FanPosts

Community blog posts and discussion.

Recommended FanPosts

Small
Off-Topical Punch
Small
LL Online Shop!

Recent FanPosts

Curlys-crew_small
Fantasy MLB Transactions (Spam)
Olly_8_small
John Hart Wants Back In
Small
Tired of Howard Lincoln? Please help!
Nolamarinergirl1_small
August 29, 2008 Post of Complete Off-Topicness
Small
LL on Yahoo! Sports
Small
All-time worst MVP/Cy Young vote?
Img_5482__crop_2__small
Ryan Rowland-Smith's latest blog (includes info on his mullet!)
Small
Arizona Fall League rosters taking shape
Small
OTFUPOD - 082708

Post_icon New FanPost All FanPosts Carrot-mini


Overlords

Garza_small Gomez

Small Jeff

Hmssurprise_small Graham

Small Matthew

Hunter_small Devin

ad

Site Meter