Filed under:

# John Buck, platoon split regression test case

The Mariners have reportedly acquired backup catcher John Buck - and by extension a rather befuddling statistics problem.

In just the latest piece of evidence that the Mariners are building their entire offseason plan around Lookout Landing op-eds, Jon Heyman has it that Jack Z and co. have agreed to exchange exactly one million small-b-bucks for exactly one large-b-Buck. As Scott covered yesterday, this deal is practically unassailable. Sure, he's uncomfortably similar to 2013 backup Kelly Shoppach and 2012 should've-been-backup Miguel Olivo. Sure, he's a bad pitch framer. But it's a million dollars, guys. The Mariners are paying John Buck to be worth less than a fifth of a win above replacement. The only way this signing can go wrong is if Mike Zunino implodes or gets hurt, in which case the Mariners' season is BUBAR anyways.

Well, actually, that's not true. There's one other way this could go wrong. You, dear reader, could open John Buck's Fangraphs Splits page and have your head explode. Because Scott Weber, Michael Barr, 2013-commenter-of-the-year nominee paulcl, and yours truly have combined to spend several hours trying to figure out what's going on with those L/R splits and have all independently come to the same conclusion: this shit makes no sense.

The problem is deceptively simple. For his career, Buck actually has a fairly standard platoon split: a 90 wRC+ vL and an 83 wRC+ vR combine to generate an 85 wRC+ in total. For the sake of getting down to one number, we can calculate Buck's performance vs. each handedness as a percentage of his overall performance:

$vL:\frac{.315\;wOBA}{.306\;wOBA} = 103\%$

$vR:\frac{.303\;wOBA}{.306\;wOBA} = 99\%$

Subtract and we find that Career!Buck's got a 3.92% split, which is a little small for a right-handed hitter. (If you're interested, you can read all about platoon split math here and here.) Regress against 2200 PA vL of league average Split%, as is recommended by The Book, and the results dictate that we should project Buck to post a slightly below average 5.4% split in 2014.

If only I could end this post there! In the interest of honesty, though, I feel compelled to show you Buck's year-by-year splits:

 Year PA vL wOBA vL PA vR wOBA vR Split% 2004 74 .275 184 .313 -12.58% 2005 130 .345 300 .271 +25.26% 2006 130 .337 279 .291 +38.75% 2007 88 .323 311 .321 +2.20% 2008 132 .337 286 .279 +15.08% 2009 50 .287 152 .350 -18.80% 2010 90 .478 (!!) 347 .311 +48.27% 2011 132 .258 398 .317 -19.47% 2012 125 .248 273 .300 -18.31% 2013 108 .267 323 .294 -9.41%

That's a truly WTB-worthy table. Despite Buck's below-average normal career split, the dude has posted a similar single-season split exactly once: in 2007. Other than that, he's flip-flopped between the two extremes of "giant standard platoon split" and "giant reverse platoon split", with almost no middle ground. So which do we believe: the career average, or the more recent data?

Here two axioms of baseball prognostication come into conflict. In order to use the largest possible sample size, we should probably pay attention to all of the data... but in order to take into account the possibility of a change in skill set over time, we should weight the more recent data more heavily. The problem is, the recent data is in direct (and enormous) conflict with the larger sample size. So what's to be done?

The easiest answer is to regress the more recent data to the mean. To the math-mobile!

For the last three reverse-split years:

$\frac{.257\;wOBA(vL)}{.292\;wOBA} - \frac{.305\;wOBA(vR)}{.292\;wOBA} = -16.44\%\;(demonstrated)$

$\frac{(-16.44\%)(365PA) + (6.11\%)(2200 PA)}{365+2200} = 2.90\%\;(regressed)$

Even regressed, that's basically half the league average split. The thing is, though, that over those same three years Buck has a .211 BABIP against lefties, and 365 PA is well below the BABIP stabilization threshold. This makes me more than a little skeptical. It seems advisable to increase the sample size a bit. Perhaps if we expand back to the last five years? That would include the first of the recent reverse-split years, but also get Buck over the 500 PA line against left-handed pitching.

The results, then:

$\frac{.300\;wOBA(vL)}{.308\;wOBA} - \frac{.311\;wOBA(vR)}{.308\;wOBA} = -3.57\%\;(demonstrated)$

$\frac{(-3.57\%)(505 PA) + (6.11\%)(2200 PA)}{365+2200} = 4.30\%\;(regressed)$

This seems like a reasonably happy middle ground; Buck's splits from the last five years can be regressed to project a smaller-than-average standard platoon split. But we've got a SNABU here, too. If we're throwing out all of the data before five years ago, what we're saying is that we think there's a distinct change in either Buck's skillset or in the way pitchers approach him that occurred between 2008 and 2009. However, as far as I can tell by digging around on Fangraphs, there isn't any such drastic change. The actual significant "turning point" was between 2009 and 2010, when pitchers drastically reduced the number of fastballs thrown to Buck and also increased the number of out-of-zone pitches. Buck being a hitter who thrives against fastballs and struggles against offspeed stuff, that could have a real affect on his performance.

It looks like we're doing this again. In the words of many frustrated internet users: BML.

So. The last four years?

$\frac{.301\;wOBA(vL)}{.305\;wOBA}-\frac{.307\;wOBA(vR)}{.305\;wOBA} = -1.97\%\;(demonstrated)$

$\frac{(-1.97\%)(455PA)+(6.11\%)(2200PA)}{365+2200} = 4.73\%\;(regressed)$

...and that's almost back to the career average. But you guessed it: there's a couple catches. For one thing, BIS data is really unreliable year-to-year, and PitchF/X classification algorithms didn't get decent until recently. In 2010 Buck had an insane outlier year against left-handed pitching, posting a .500 BABIP despite an 18.2% line drive rate. That's completely crazypants. Can we really base our projections on that one year? Is that even what we're doing, by including it as just one of the four years in our analysis? I don't know. I don't even know. TARBU.

It doesn't seem like there's any one sensible endpoint after which we can begin to look at regressions of Buck's split data for the purposes of projecting future performance. Seasons are pretty arbitrary endpoints, anyways; for all we know, Buck made a great big honking adjustment in mid-2009. It doesn't seem reasonable to suggest that Buck has reverse platoon splits, given that only one set of regressed numbers produced that result and that that set relied on a .211 BABIP vL, but it also doesn't seem reasonable to suggest that his career splits are representative of true talent in spite of recent results. The truth, as per usual, is probably somewhere in the middle. I'd feel comfortable projecting Buck for a smaller-than-average normal platoon split, which I guess does make him more desirable than your average backup. It's kind of unintuitive, given his recent performance, but regression is king.

Still, at this point I'd like to invite you to go back up and look at that table again. Yep. In ten years as a starting catcher, John Buck has produced a reasonable-seeming single-season platoon split once. He's a backup now, and he's going to get something like 200 or 300 plate appearances, and over that sample size anything can happen. The only honest answer to this question is no answer at all: I have no idea what kind of L/R splits John Buck is going to post in 2013.

Really, though, it shouldn't matter. As inconsistent and mind-boggling as Buck has been in this one respect, in terms of overall production he's been pretty good about staying between 70 and 100 wRC+. That's a damn sight better than all-glove no-bat youngster Jesus Sucre, and it's a damn sight better than all-bad no-good oldster Humberto Quintero, and it's perfectly acceptable for a backup catcher. Certainly it's worth a one-year, one-million contract.

So rejoice! John Buck may be confusing, but at least he's a good backup. At least the Mariners didn't overpay him.

At least he's not WBB.