/cdn.vox-cdn.com/uploads/chorus_image/image/17610261/gyi0061237538.0.jpg)
One of my least favorite baseball analysis tools is the player comparison table. They're neat when used properly, but they're pretty much never used properly, because people like to cherry-pick the endpoints of the data they're using to make their own favorite players look good. No one ever includes dates next to the words "Player A". Why not? Would it give away "Player A"'s secret identity? No. What it'd do is reveal the illusion behind the magic trick, the sample size screwery that lets you favorably compare Mike Carp to Albert Pujols. Beware an un-dated comparison table.
That said, here is an un-dated comparison table.
AVG | OBP | SLG | |
Player A | .276 | .362 | .552 |
Player B | .274 | .366 | .558 |
The first thing that you probably notice about that comparison table is, man, those are some terrible statistics. (Sorry about the use of batting average, but the data source I'm using doesn't offer much else.) The second thing that you probably notice is, wow, Player A and Player B are really similar! The third thing that you probably notice is not actually something you notice, it's something I tell you: Player A and Player B are the same player. Ha! Got you! Boy, do I feel smart!
Yep, that's right: Player A and Player B are both Michael Saunders, just in different time frames. Player A is Saunders from Opening Day 2013 to the beginning of his mid-May megaslump. Player B is Saunders from the All-Star Break until today. Do you remember how excited you were about Michael Saunders before his offense walked off a cliff in May? He's doing that exact same thing again. At this point in the post, you should feel good about Michael Saunders.
I'm gonna go ahead and spoil the ending a bit here: at all points in this post, you should feel good about Michael Saunders. This is partly because Michael Saunders is a young and exciting baseball player, and partly because I have an irrational affection for outfielders like Saunders and am writing this post from a biased point of view. I like guys who generate enough value with their gloves and legs to be valuable at corner positions even without above-average offense. I especially like them when they also have the tools to produce above-average offense. Michael Saunders fits both of those bills. (Duck joke.)
A little more than a week ago, I was extremely saddened to discover that overLLord Scott Weber does not share my pro-Saunders bias. Below, find a relevant quote:
Logan, I think you're the only Mariners fan in the world who still cares about Michael Saunders.
-Scott Weber
Naturally, I couldn't let this stand. So I set out to defend my favorite avian Mariner... with math! This post is the result.
We're all familiar with the Michael Saunders story: he sucked, he redid his swing, he looked better for a year, ZiPS thought he would regress, Mariners fans said "but he redid his swing!", 2013 rolled around, he was awesome for three weeks and Mariners fans felt vindicated, and then he got hurt and ZiPS turned out to be right about the regression thing. Except, here's the thing: now, two months after his injury, he's suddenly caught fire again. In almost exactly the same way as he caught fire at the beginning of the year. So what gives? What changed in the middle? Well, he got injured.
Wait a minute. How exactly does a shoulder injury affect a player, anyways? Could Saunders' injury be responsible in part for his midseason slump? We're all on board with the idea that returning too soon from the DL can lead a player to suck. So when it comes to shoulder injuries, how soon is too soon? Is there any chance that we can blame Saunders' midseason struggles on his broken wing?
This is the part of the post where I describe the methodology that I used to answer the question. Ostensibly I'm doing this to allow you to check my work, or to help you along in your efforts to duplicate it. However, since you'd be insane to try to copy what I did, and since there's no way I'm going to go back and check it, I'm really just telling you about my methodology because I want to complain about how much of a pain in the ass it was. Indulge me for a minute.
***Griping Begins Here***
In order to get a better picture of the effects of a shoulder injury, I wanted to collect as large a sample of shoulder injuries as I could. This meant finding an injury archive. Unfortunately, there aren't really any good injury histories out there on the internet, so I had to improvise a bit. Using the Wayback Machine internet archive and Yahoo! Sports' MLB Fantasy Update page, I raided Yahoo's DL archives and created a list of every position player shoulder injury that the Yahoo! page has listed since 2002 (before 2002, no data exists). This totaled out to about 180 different injuries, which I listed by date, type, arm, and player name. I identified the handedness of each batter and determined whether the injury was to the front shoulder in the batter's box or the back shoulder in the batter's box.
My list complete, I started to cull. I wanted to study players with an established baseline of major league performance to which I could compare their post-injury performance, so everyone who hadn't collected at least 500 MLB PAs before the injury was cut from the list. I also wanted to look at players who had gotten back on the field in about the same timeframe as Saunders, so everyone who had more than a four-month gap between their injury and their return was likewise cut. Between those two rounds of deletions, I got rid of over half of my list of 180 players, leaving myself with about 80 actually-studiable injuries.
I was then faced with a couple of questions. First: what should I use for an "established baseline of performance"? Should it be career numbers? That seemed like a bad plan, since there were guys like Mike Piazza on the list who'd gotten injured late in their careers when their offense was already in decline. Should it be single-season numbers from the year before the injury? That also seemed like a bad plan, given how many part-time players were on the list. I ultimately decided to go with the stats from each players' most recent 500 plate appearances before the injury.
Second, and more importantly: how should I divide up the plate appearances post-injury? I knew I wanted to separate them into buckets based on when they'd happened, but I was having a tough time deciding between dividing them by time or by number of plate appearances. On the one hand, players heal with time, not with at-bats, so from an intuitive biological standpoint it made more sense to go with time as the independent variable. On the other hand, number of plate appearances presented a more attractive consistency of sample size. In the end I decided that lumping the players together would erase most sample-size concerns and divided the data by months post-injury rather than by number of plate appearances post-injury.
Having made up my mind, I went to baseballmusings.com's Day By Day Database and just put my nose to the grindstone manually harvesting split-by-month data for all of the players on the list. Finding the right dates to get as close as possible to 500 pre-injury plate appearances was frustratingly time-consuming, but there wasn't really anything I could do about it. At any rate, you can find the final resulting spreadsheet here.
***Griping Ends Here***
The results:
Pretty obviously, players returning from shoulder injury performed significantly below their pre-injury ISO for at least the first two months after their return, and their OPS slumped accordingly. For some reason the second month is worse than the first pretty much no matter how you split up the data; I'm not sure why exactly this is. I checked to see if the quick-returning players felt the effects less, but no such luck: even the guys who got back in the first month were worse in their second month back than their first month out, so it seems like it's probably just a data fluke.
Splitting the data up further didn't lead to a whole lot of interesting conclusions, but there was one thing worth bringing up. Check out the front/back shoulder splits.
It seems like, for whatever reason, injuring the front shoulder has a bigger effect on power production than injuring the back shoulder. (Remember, Saunders injured his front shoulder.)
So what do we have here? What we don't have is a particularly ironclad conclusion. There are a few too many irregularities in the data harvesting process, and in the averaging process, and it would certainly be weird to conclude that the second month after a return from the DL is worse than the first in terms of ISO production. But what we do have is some evidence that vindicates Michael Saunders a bit. He's not the only guy who's hurt his front shoulder and seen his power vanish for the next two months. In fact, pretty much everyone who's hurt his front shoulder has seen his power vanish for the next two months. 2013 Michael Saunders is more "rule" than "exception".
I'm not saying Michael Saunders is guaranteed to be better than he's been so far this year, and I'm not saying he's as good as his hot streak. We've just recently learned the dangers of saying that Michael Saunders is as good as his hot streak: May 2013, anyone? But I'm saying he's probably better than you give him credit for, and he's definitely better than Scott gives him credit for.
Ha. Take that, Scott.