Justin Smoak and the Distance-ISO Correlation

Word's going around the blogosphere that Justin Smoak isn't very strong. Well... is he?

Otto Greule Jr
why have i never seen this photo before
Otto Greule Jr

There is a large mountain of evidence that Justin Smoak is just not strong enough to be a productive major league first baseman.

-Dave Cameron, USS Mariner

April 21, 2013

Context is everything in baseball.

-Dave Cameron, USS Mariner

May 6, 2013

Even the best statisticians can't predict the future, and I am not one of the best statisticians.

-Logan Davis, Lookout Landing

March 23, 2013

As a Mariners fan and sportswriter, I can't help but notice certain similarities between myself and Seattle right-hander Brandon Maurer. Like Maurer, I'm a young and relatively inexperienced newcomer trying to perform well in a high-profile environment. We both have promising skills and potential to improve, but it should be obvious to anyone who's paying attention that we're far from finished products. Maurer knows very little about how to get lefties out; I know very little about how to write programs that efficiently parse data for me. We've had some high points (think Maurer's start against Anaheim) and some low points (think the statistical analysis in my Seager article). Like Maurer, I'm trying my hardest to improve. Unlike Maurer, I do not have a rockin' beard.

Perhaps the biggest advantage that Maurer and I have in common is the opportunity to learn from our betters. Maurer can draw pitching inspiration from Felix's sequencing and between-start routines; likewise, I can draw writing inspiration from the articles of Dave Cameron.

A couple of weeks ago, Cameron pronounced Justin Smoak officially dead. He didn't mince words: "not a long term answer to any question a Major League team should be asking". The biggest strike against Smoak is, and has been for a while, his lack of extra-base power. Even since his "breakout" on September 1 of last year, Smoak has struggled to hit the ball hard, producing only 17 extra base hits in 229 plate appearances. This implies a certain lack of muscle.

It makes a lot of sense to argue that Justin Smoak isn't strong. We've all watched the games, we've all seen the warning-track flyouts, we've all looked at the surprisingly non-stellar minor league numbers... it stands to reason. It also helps to explain how the Mariners could've been fooled into trading Cliff Lee for him. Justin Smoak looks like a strong guy, or at least like a guy who could become strong. He's a big ol' country boy, and big ol' country boys have a reputation for being able to hit the ball a long ways. It's not hard to believe that the Mariners' scouts, and everyone else, could've been tricked by his build into thinking that he'd eventually develop the kind of big-time power that he hasn't yet shown. Presented with the evidence of Cameron's article, I pretty much instantly agreed.

But I also saw an opportunity to expand. Those of you who have been following me on my journey as a budding baseball blogger have probably noticed that I occasionally find new toys and get really interested in using them to analyze things. So it was with SW%/CT%/ISO comparisons, and so it is with Baseball Heat Maps' batted ball distance and angle data. I wanted an excuse to play around with the data set some more. Proving that Justin Smoak isn't strong and would thus never hit for power seemed like an ideal opportunity! So I set out to do so.

My main goal was to figure out how strong Smoak is relative to other first basemen (his competition), with the secondary objective of proving a correlation between batted ball distance and offensive success and using Smoak's batted ball numbers to project his offensive production. As a measure of strength, I decided to use average batted ball distance on fly balls, line drives, and home runs - solid contact in the air. I also narrowed down the search to players who got at least 300 plate appearances between 2010 and 2012, because I didn't want to have to muck about with offensive numbers from different eras.

The results, once I managed to harvest all of the data, were unsurprising. Smoak came in 42nd out of 58 first basemen, with only 5 of the 16 first basemen below him being at least average hitters. The company he keeps on this particular list is hardly inspiring.

Position Name Distance ISO wRC+
37 Allen Craig 262.95 .215 135
38 Matt Downs 262.80 .186 98
39 Anthony Rizzo 262.51 .157 99
40 Yonder Alonso 262.22 .131 112
41 Jesus Guzman 262.09 .169 121
42 Justin Smoak 261.16 .153 90
43 Mark Kotsay 260.87 .118 86
44 Justin Morneau 260.65 .184 121
45 Xavier Nady 260.04 .110 69
46 Adam Lind 259.79 .181 93
47 Jorge Cantu 259.65 .125 74

So the initial returns backed up the hypothesis: my first set of data implied that Justin Smoak isn't strong enough to be a successful major league first baseman. Just for the hell of it, and because I knew that that table would make for the world's least satisfying article, I embarked on a quest to complete my secondary objective.

It turns out that batted ball distance and wRC+ don't correlate very well (r^2=.176). There are simply too many other factors, from BB% to K% to BABIP, for everything to line up nicely. Somewhat disheartened, I took my ambition down a notch and decided to compare just distance and ISO.

This time the correlation was excellent - in fact, quite a bit better than expected. It turns out that about 54% of ISO can be explained by batted ball distance alone (correlation coefficient = .732). Aside from the Rockies, who I had to throw out because their home park is so extreme that it messes with the correlation, pretty much every player on the list had an ISO that fell quite close to the ISO predicted by a simple best fit line.

A-ha!, says I. Here's the piece de resistance! I whipped up a pretty graph of the correlation, stuck Smoak's face on it to show exactly where he sits (with the intent of reinforcing his lack of strength relative to the average major league first baseman), and got ready to write this post. You can see the graph below.


Only, I saw the picture, and I got to thinking. Smoak's not that far from the middle, and he has (for some reason or other) historically performed below his projected ISO. The result condemned his strength as expected, but it didn't condemn it as harshly as I thought it would. It seems, based on that graph, like Smoak's not all that far from being a decent power hitter.

And then I thought about Safeco. Maybe that could explain the discrepancies between projected ISO and actual ISO! It hadn't been my original intent, but if I could improve my detected correlation between batted ball distance and ISO by adjusting for park, that's be a cool little analytical side project. So I began a journey to make the graph look nicer.

The Baseball Heat Maps data comes from MLB Gameday. It's not terribly accurate, and it's certainly not Hit F/X, but it's the best data set that we have publicly available at the moment. Still, there are some improvements left to be made - like park adjustments. It's generally accepted that every 1000 feet of altitude adds 4 feet to a well-hit ball out to center field, as does every 10 degrees of air temperature. Using Matthew Carruth's MLB temperature map and a set of elevation data that I really hope is accurate, I added my own park factors to the batted ball data to get a new set of distances. While I was at it, I created park factors for ISO based on an average of the 2010-2012 seasons. What I produced yielded a far more interesting set of results.

Before I show you the actual results, I am obligated by my conscience to inform you that you should by no means take this data as gospel truth. For one thing, the Gameday data is fuzzy and not perfectly reliable. For another thing, I applied temperature and altitude park factors in a way that's really far too general to be used in precise analysis. For a third thing, I consciously removed outliers from the graph (though in my defense, they were all Rockies, and Coors Field is a nightmare to make park factors for).

But despite the inaccuracies both inherent to the data set and caused by my inexperience, I do feel that the adjustments I made improved the data. Want to know why? Here's a list of the top 20 strongest first basemen of the last three years, first by the adjusted distance, second by the unadjusted distance:


The only somewhat unexpected name on the left is Aubrey Huff, who got a big boost from playing in San Francisco. The right features Eric Hinske and Lyle Overbay, neither of whom seem to belong. I personally think the left list is closer to how I would rank MLB's first basemen by strength, but in any case they aren't very different, so you can be sure that my edit didn't horribly muck up the data set.

In fact, adjusting the park significantly improved the correlation, boosting the r value from .732 to .808 and making the graph quite a bit prettier. But even setting aside looks, adjusting for park had a big impact. Check out the new placement of our friend Justin:


After adjusting for Safeco, Smoak ranks 30th out of 58 first basemen in terms of average distance on home runs, fly balls, and line drives. It's not a leap into the top tier, but it's certainly a big bump up from 42nd. His park-adjusted ISO is up in the much friendlier .175 range, and the discrepancy between park-adjusted ISO and ISO predicted by batted ball distance vanishes. (If you're wondering why there seem to be fewer data points in this picture, there's a cluster of five right behind Smoak's head.)

Once you adjust for context, Smoak's position among 2010-2012 first basemen looks a lot better. Now, instead of dwelling near the bottom of the list with the Adam Linds and Jorge Cantus of the world, he's hanging out in the middle tier, just below some pretty big names.

Position Name Distance ISO wRC+
25 Lance Berkman 265.62 .214 139
26 Kevin Youkilis 265.54 .200 127
27 Matt LaPorta 264.26 .148 89
28 Billy Butler 264.13 .178 131
29 Matt Downs 263.58 .188 98
30 Justin Smoak 263.33 .177 90
31 Derrek Lee 263.09 .171 109
32 Jesus Guzman 262.84 .183 121
33 Anthony Rizzo 262.24 .162 99
34 Eric Hosmer 262.20 .153 97
35 Gaby Sanchez 261.41 .169 103

Out of all the first basemen who hit the ball further in the air than Smoak did over the 2010-2012 period, only four had wRC+ marks below the league average. Those four were Matt Downs (98), Lyle Overbay (97), Eric Hinske (94), and Matt LaPorta (89). Only LaPorta, Overbay, and the AT&T-challenged Huff managed ISOs below .170. There are a few big-name first basemen who don't hit the ball as far as Smoak does, namely Allen Craig, Paul Konerko, and Justin Morneau. Billy Butler, Kevin Youkilis, Lance Berkman, and Mark Teixeira barely eke him out. What I'm saying is, the situation isn't nearly as dire as that first chart makes it seem.

This is what the big-time sabermetricians mean when they say it's important to look at context when evaluating baseball players. Simply glancing at Smoak's batted ball distance numbers relative to the rest of the league's would give you the impression that he isn't anywhere near strong enough to succeed as a first baseman, but that's not the case. Smoak just happens to play in a particularly extreme context, and as a result his distance numbers look bad despite actually being middle-of-the-pack. When I set out to find out how far Justin Smoak could hit the ball, I was expecting to see him at the bottom of the list with the James Loneys and Adam Kennedys and Casey Kotchmen of the league. Instead, I found him smack dab in the middle. My expectations were colored by watching him play in an extremely pitcher-friendly context.

I'm not going to use this chart to declare Dave Cameron wrong. For one, this data doesn't really support that - even if most of the hitters stronger than Smoak were good, Matt LaPorta isn't exactly an encouraging precedent. For two, this data is hardly the be-all and end-all of evaluating hitter strength (a .808 correlation coefficient still isn't 1). For three, I think that challenging the conclusions of the managing editor of Fangraphs is a bit above my pay grade. Brandon Maurer probably doesn't offer King Felix advice on throwing sliders.

But I will suggest that maybe this Justin Smoak thing isn't as open-and-shut as Dave makes it out to be. I no longer believe that Smoak has a future as a productive starting first baseman in Seattle, given that the park appears to be murdering both his batted ball distance and his ISO, but to be honest I haven't believed that for a while. I didn't even believe that when I wrote this, which I did mostly just to cheer you guys up. But in a warm, elevated, hitter-friendly park... who's to say? Smoak might do better in Texas than Mitch Moreland has been. He certainly hits the ball further and walks more. Even if he's only a part-time player, I don't see much reason why Smoak doesn't deserve a job but Casey Kotchman and James Loney do. He'll stick around in the majors, as long as his agent does his job.

Justin Smoak is no longer part of the Mariners' core - he just isn't what they thought he was when they acquired him from Texas. But neither is he a total loss. Once you look at the big picture, considering Smoak's extreme context, he isn't nearly as worthless as he often seems. He's not awful. He's just a prospect who came up to the big leagues and didn't quite work out.

It happens all the time.

As a dejected Justin Smoak watches yet another inning-ending fly ball settle into the left fielder's glove, Brandon Maurer looks on from the dugout. He hopes that won't be him in three years, walking slowly back to the bench, head low, eyes lower. Smoak gets down the stairs and starts to take his gloves off; Wedge pats him on the back. Dear God, Maurer thinks as he slips on his own glove and walks out to the mound, if you're listening: don't let that be me.

He throws just a few warmup pitches to Shoppach, before the umpire gives the sign. Showtime. The batter steps into the box. Maurer looks in. Shoppach's fingers go down: slider. He takes a deep breath, and rears back, and hurls it to the plate...