Framing the Framing Debate
Defense, the final frontier. Okay, not really, but it makes for a decent lead off. Over the last couple of years, various analysts have made tremendous progress in quantifying defense. We now have some solid estimators in +/- and UZR. Another one, SAFE, might be on the way to accessibility soon as well. Nevertheless, much work remains. Specifically, evaluating the defense of catchers remains a murky field filled with some great ideas but little consensus. I have been coming back to this subject off and on for the last four years or so and my general breakdown for a catcher's defensive responsibilities includes the following four categories.
Fielding range on pop ups, bunts and short ground balls have been covered under David Pinto's PMR for years (2006, 2007, 2008), and they are measurable under practically any defensive system out there. These types of plays, however, are both relatively rare and simple to execute events. That brings us small sample sizes and only minute measurable differences in ability. Specifically, in the case of pop-ups, the vast majority have large enough hang times that every Major League catcher makes a successful play.
Catching here typically has referred to preventing wild pitches and passed balls. To measure that, there are a couple of good methodologies out there. Dan Turkenkopf has a good annual series looking at stopping balls in the dirt (2009 version) and David Gassko has done some good work in the past on wild pitches and passed balls in a more general sense (THT Link). The thought processes in the two linked articles are sound. If Turkenkopf combined his blocks in the dirt with a secondary look at all other pitches, I would be extremely pleased with the overall picture that could provide.
Throwing out base runners is a more complicated system, though still doable. Essentially, it is the same process as John Walsh and I use for measuring outfielder arms. The key here is that it is much more valuable for a catcher to throw out a base runner that is attempting to steal than it is to prevent the attempt in the first place. Attempting a stolen base is usually a losing proposition for the offensive team so catchers that have a reputation for throwing out base runners well (e.g. Ivan Rodriguez) will suffer because runners will attempt to steal less often against them. This damages the defensive team's chances to remove said runners from the base paths. Therefore, you must note how often the catcher sees opportunities to throw runners out.
This brings us to the final point, the mystical cERA. Jeff wrote a well thought out post on the subject (Link) which reminds us that while researchers have shown that, to date, that any cERA effect lies below the level of detection, that threshold is important. It remains high enough that a meaningful difference might be hiding in the noise, much in the way that clutch hitting or a pitcher's ability to influence BABIP does.
Ever since that post, I have been thinking more about the possible aspects of cERA. It has always been my preference to deconstruct complex systems as much as possible and then go from there. What sort of skills, not covered in any of the above three categories, would help the team suppress runs allowed? I came up with three.
There is certainly a coaching aspect that contributes to the catcher-pitcher relationship. I group together tasks such as knowing how to keep the pitcher's mental frame of mind in check or how to spot minor mechanical kinks into this sub-category and dismiss it as far as defense goes. To me, these are coaching skills and should be the topic for another discussion.
There are also skills that I choose to lump under the vague term "game management." Chief among these would be pitch sequencing which I am comfortable giving some credit to the catcher, just not all. After all, the catcher acts as a guide, but it is the pitcher who confirms the pitch and is the one who throws it. I cannot in good conscious sit here, after years of blasting Felix for piping in fastball after fastball, and then claim that catchers deserve full credit for calling pitches. That would be hypocritical and, I feel, an illogical rationale.
The final grouping is one that I feel should actually go under the catching category, but quantifying it remains more elusive. It would encompass all the little things that a catcher can do in order to generate more strike calls. Generally, people refer to this as framing. The goal is to use whatever combination of skill, cajoling or repetition is possible to get an umpire to call a borderline pitch a strike.
People around the net routinely vilified Kenji Johjima for his framing abilities. Personally, I felt it had more to do with a personal bias against him then about anything tangible. Those same people typically lauded Rob Johnson, perhaps only as a reaction to Johjima, for his apparently superb capacity to gather more strike calls.
As I am well known to say, hogwash to subjective statements! Let us get some data in here! Keep reading for more than you probably wanted.
If Rob Johnson actually was so much better at framing pitches than Kenji Johjima then people making that statement should have no worries that the evidence supporting that would be apparent, right? Given the level of vitriol hurled at Johjima's catching style, and the subsequent amount of love heaped on Rob Johnson for his ability, it should be obvious in the data. Before any knuckleheads drag it out, no, I am not talking about the cERA statistic as typically presented.
There are many problems with that statistic. Going over them is a waste of time, but keeping in mind that catchers do not catch the same amount of pitches from each pitcher is the big one. After all, you can put the greatest frame in the world on a Carlos Silva pitch and it does nothing to detract from the fact that Carlos Silva threw it. And a masterpiece from Felix Hernandez can stand by itself.
I am writing of data pertaining to the actual question in hand; did Rob Johnson frame pitches in a better manner than Kenji Johjima did? Was he better at getting pitches called a strike? Critics of Johjima loved to state that he was terrible at getting the low strike called because of his style of dropping his glove downward after receiving. Ergo, if I were to plot every pitch caught by each catcher, there should be a clear difference. The mind, and its selective memory, a powerful thing be.
Below is such a plot for Kenji Johjima. It is every pitch caught by Johjima in 2009, recorded by Pitch F/X, which the batter took. The red dots represent called strikes, the blue dots, called balls.
I have zoomed in on the important part of the plot; everything outside the zone was an obvious ball. The black box represents a rough approximation of the rulebook strike zone. The grouping of red dots in relation to that zone is not surprising. Several studies of umpire tendencies have shown that they call the strike zone a few inches wider than the actual plate. The width of the zone portrayed here is already wider, stretching an extra inch each way in order to account for "the black." In reality, it appears that pitchers get about two extra inches to each side.
Now, if this were a TV broadcast, this report would be awful because I have given the critics the sound bite they crave right at the start. I will make it even worse in a few sentences since this is text and I get to indulge in sarcasm. Look at the bottom of the strike zone. Do you see all those blue dots? They were right! Johjima consistently costs his pitches the low strike!
That statement is the victim of a classic lack of context. Before anyone trumpets victory, Johjima needs to be compared directly to Rob Johnson. Below is the same plot, under the same constrictions, for Rob Johnson.
Rob Johnson doesn't get the low strike either! Thank you, human umpires. In addition to umpires typically calling a bigger zone on the horizontal axis, they also frequently call the zone too small on the vertical axis. As illustrated, the strike zone should be taller than it is wider. In reality, it is wider (roughly 22 inches) than it is tall (roughly 18 inches).
Time for a quick tangent. Doesn't it appear that Rob Johnson caught against many more left-handed batters then Kenji Johjima? There are certainly a lot more pitches located on the left-handed side of the strike zone. As it turns out, Kenji caught a fraction (less than 1%) more lefty batters.
Now, those are some sweet plots and they make for some compelling evidence I think, but they do not do the whole trick. Visually comparing the two, after all, is just one step better than what the people at the beginning were doing. What would be truly useful is quantifying any significant difference between the two.
Here is where it begins to get more complicated with statistical jargon. Please do not fret; I will do my best not to lose anyone. I analyzed those plots and created grids, 0.1 inch by 0.1 inch square. Then I calculated an equation (using a kNN process to those that care) to tell me how likely it is that a pitch landing in each grid square would be called a strike.
Are you falling asleep yet? I know it's dry, so here are some colors to distract you. I took those probabilities and contoured them up, heat map style! The black lines represent the boundaries of the 25%, 50% and 75% areas for called strikes. That is, everything inside the middle (50%) circle was more likely to be called a strike than a ball. Kenji is on the left, Rob on the right:
Now these are certainly pretty and marginally useful just to see where the various likelihoods are, but on their own, they get no closer to what I was talking about before - a quantifiable difference. That is okay though; they were just an intermediary step. The next two steps are the important ones.
First, I finally get around to doing what would seem natural; I subtracted the two plots from each other. For every square in the grid, I took the probability that an umpire would call the pitch a strike for Kenji Johjima and subtracted away the probability that an umpire would call the pitch a strike for Rob Johnson. If the two were equally likely to get a pitch called a strike, the net result would be zero. If the equation predicted Kenji to more likely get a strike call then it would be a positive number (he would have a higher probability), and similarly it would be negative if the probability favored Rob Johnson.
Am I a master of foreshadowing, or what?
My final graph is a plot of said differences. I re-instated the hypothetical strike zone again to provide some reference. The blue dots are grids that hold a preference for Kenji Johjima by at least 10%; the red dots a 10% or greater probability difference for Rob Johnson.
Before passing judgment, I find it satisfying that some patterns are present in this plot. There are indeed areas where it looks like one catcher works the umpire better than the other does. That makes more sense to me than if the differences had been more randomly scattered.
Now, there are clearly more blue dots than red ones, but that's still not a quantifiable answer. For one, Rob Johnson has fewer zones of advantage, but they might be of greater magnitude. In addition, they might come in areas, such as the dots off the left side of the strike zone that see more pitches. The final step is taking this grid and finding out how many pitches in 2009 landed in each spot. To get a total difference in predicted strikes called, it is simply a matter of going square by square and multiplying the number of pitches in that square by the difference in probability for the two catchers.
Based on the above plots, Kenji Johjima's predicted pattern of framing would have resulted in 13.5 additional strikes being called if he had caught all of the pitches studied. Taking Dan Turkenkopf's figure of .161 runs per added strike, that difference is worth 2.05 additional runs for every 10,000 pitches not swung at, which is right about one season's worth from a full time catcher.
Two runs.
In conclusion, from a thought experiment perspective - which is what this began as - Kenji Johjima looks to have been a better framer of pitches than Rob Johnson was last season, at least by the method that I decided to measure it by. Secondly, that difference was minuscule, even before adjusting for any possible regression.
I am not going to make any overly broad statements about the value in framing pitches across all catchers. I only looked at these two and only this single year. It very well could turn out to be a simple fluke that they ended up so similar. Nevertheless, aside from the obvious Mariner interest, isn't it interesting that two catchers who generated so vastly a different perspective on their framing abilities not only ended up being so close in value, but actually in reverse of the mainstream opinion?
As a last piece, I created two GIFs that flip back and forth between the two comparison graphs so that you can see some of the differences in a different way. Here is the link to the general pitch plots and here is the link to the differing contour maps.
109 comments
|
18 recs |
Do you like this story?
Comments
I don't have any questions as this seemed to me to be very well thought out
Though I can’t help but feel that this
Specifically, in the case of pop-ups, the vast majority have large enough hang times that every Major League catcher makes a successful play.
was JI bait
There is a formatting problem right under the jump
Now, back to reading….
Excellent article
One question, though: how accurate would it be to make up sort of a “Catcher Strike Zone” statistic consisting of the area of a circle centered on the center of the rulebook strike zone large enough to encompass, say, 90% or 95% of a catcher’s called strikes? Is there just too much noise at the fringes for that to work out?
This is great work
And it just makes me hate Rob Johnson even more.
Wow, an actual data-based attempt to quantify catcher pitch framing!
This is amazing.
This is a good peice, and I LOVE the graphics
However, you need to make some adjustments before you can use the results. For one, differences that you see between Kenji and Johnson are most likely a combination of (random variation from umpires + pitcher biases + framing skill).
To control for the variation from umpires, somewhat, it would help to consult this study by Jeff Zimmerman at BtB. Check out the spreadsheet at the end for individual umpire zones. You would probably be able to do a better job at determining umpire zones by using the Neutral Net type technique you used for this piece, but that may be way more computationally intensive and harder to use, so it’s up to you. At any rate, umpires have huge variance in the zones that they call, so it would help immensely to normalize to each individual umpire zone.
Another thing you have to look at is batter hand and pitch_type. Curveballs and sliders have a tendency to “bend” around the zone, and if one catcher caught more curveballs than another, he would be biased against by simply looking at the px and pz fields in Pitch f/x.
Furthermore, you have to control by count. There are significant differences in how strikes are called depending on the count (Allen and Hale)
Basically, you have to control for all of the variables that you can think of. Or else, there is simply way too much bias in the data for it to be useful at all. The “conclusion”, I guess, is that there is very little difference between Kenji and Johnson in terms of game calling ability, at least last year. I don’t think you can state that with any sort of confidence given all of the potential biases in the data.
Don’t take this the wrong way – I love work like this and I think this is a good start. However, you need to spend a lot more time on manipulating the data for it to be of any use. It just depends on how much time you want to spend on studying this issue…
by vivaelpujols on Feb 1, 2010 8:55 PM PST reply actions 1 recs
I am not going to control for the individual umpire zones.
They’re pretty similar across the board and I feel that’s far over fitting the data sets and introducing more bias than it gets rid of. There’s no systematic bias in place for which catcher caught with which umpire so I’m comfortable assuming its a wash over a full season.
I did look at batter hand, pitch type and counts. I didn’t find any differences. This isn’t a journal article. I’m trying to keep it readable and entertaining for everyone. Stuff that turns out to be insignificant, and both catchers caught similar number of pitch types, in similar counts to similar batter handedness (remarkably similar, really), gets trashed or else this turns into a 5,000-word thesis and nobody gleans anything from it aside from stat students.
Don’t read this too harshly, but I would appreciate if in the future, you asked before assuming I didn’t give due diligence. You’re coming here to comment on my post, at least show me that bit of respect.
Yeah, there was a discussion about this on Tango's blog.
If everything in sabermetrics had to be submitted and peer reviewed before it could be built upon, we would just now be using OPS+ or something.
Hey everyone, Follow me on Twitter!, check out My Baseball Blog, and Last.fm me!
That wasn't exactly the point of the discussion (which I was wholeheartedly a part of, FYI!)
What I did was EXACTLY what Tango, et al, advocate for, which is quick and to the point criticisms and questions of the original article not being bogged down by the semantics of the Peer Review process. Matthew posted his piece at 11 PM and I commented 55 minutes later with my concerns. Nobody is asking for all sabermetric work to go under a rigorous peer review process, but at the very least (especially given that it is posted online, with comments enabled) it should be subject to criticism.
He didn't ask for criticisms.
It was clear that the focus of this work was comparing two Mariners catchers to each other and dispelling the popular belief that Rob Johnson has an amazing “framing” ability that Kenji Johjima lacked. Almost none of the readers understood it as trying to quantify a new aspect of catcher defense in a way that hadn’t been done before, thus revolutionizing the way we evaluate catchers.
Hey everyone, Follow me on Twitter!, check out My Baseball Blog, and Last.fm me!
...
He didn’t ask for criticisms.
You don’t have to ask.
He posited a question at the beginning of the article:
did Rob Johnson frame pitches in a better manner than Kenji Johjima did? Was he better at getting pitches called a strike?… Ergo, if I were to plot every pitch caught by each catcher, there should be a clear difference.
He then plotted out the data, gave his analysis, and made a conclusion that there was little difference in the value of the two catcher’s framing ability last year.
That is practically the defition of the scientific method. I had legitimate reservations (and possibly still do) as to the method that he used in his study, and thus commented my concerns. No where did I assume this was “trying to quantify a new aspect of catcher defense in a way that hadn’t been done before”. I was simply commenting on the conclusions made in this article alone.
You have no standing to criticize my criticisms of Matthew’s piece. It was a scientific study – meaning making conclusions based off of objective data – and thus is subject to criticism. I hardly think my comment is going to turn this study into a 5000 word thesis on the merits of Johjima vs. Johnson.
by vivaelpujols on Feb 1, 2010 10:03 PM PST up reply actions 2 recs
Brief tangent re peer review ...
… Peer review was really necessary when the ability to publish was a limited resource. The most valuable function of the peer review process was to prevent the limited print resource from being wasted on work that was of a low quality.
With web publishing that constraint is removed. There really is no reason not to simply publish and let work be reviewed by everyone instead of by a select peer group. The best work will still be recognized and rewarded, but it will have the advantages of being critiqued by a broader audience.
The changes will not occur without resistance, because it means that the anointed priests of peer review will be forced to relinquish their current privilege, but it will happen whether they like it or not.
We actually have a good example of the changes already in the sports world, where the internet has broken the monopoly on information and analysis that was previously concentrated in the print media. There are those vestiges of the Old Guard who still take every opportunity they can to rail against the perceived bloggers in pajamas in their mother’s basements. But the revolution has reached the stage where they just sound like grumpy old men.
In the original article,
you made no mention of batter hand or pitcher hand, no mention of pitch counts or types, no mention of umpires. You simply showed the aggregate pitch locations and a predicting function of whether or not it was a strike or not based solely on that data, then extrapolated from that. That is not the correct way to do this kind of study and you know it.
Would it have been that hard to control for the other variables, or at least give them due mention as possible confounding factors in the OP? Would it really have turned this into a 5000 word thesis to say “I then controlled for other variables, so as count, batter and pitcher hand, and pitch type”, or if you didn’t, say “of course there are other variables that could massively effect the results of this study, such as…”?
I apologize for assuming you didn’t look at those things. However, if you are not even going to mention them in the article, how do you expect anyone to think differently?!
By the way, for other readers, Dan Turkenkopf took a look at this same issue back in 2007.
http://www.beyondtheboxscore.com/2008/4/5/389840/framing-the-debate
You’ll notice that he found a much bigger difference across all pitchers than Matthew found here with Johjima and Johnson. I’d like to see him repeat this method using more data and better data control.
So we apparently have a little bit of a problem here, and I'm not talking about Matthew's piece or your issues with it
The tone you take as a guest to Lookout Landing is absurd. Very few people here know who you are except as the loudmouth who comes in here, argues with people and tells them what to do in shaping their work to be more in line with your expectations. You are not recognised as any sort of expert here, and the tone you take, which may be appropriate for your other haunts, therefore comes off as sublimely arrogant (ironic from me, I know).
You’re a competent analyst and you bring interesting thoughts to the table, but you have to understand that no matter how good you are, writing like you do ‘at home’ here is going to do you absolutely no favours in the eyes of the residents of this blog. Instead of being considered as someone who brings something to the table, the vast majority of LL looks at you as an annoyance. You are not Matthew’s equal here, just like he would not be yours on your home turf.
by Graham MacAree on Feb 2, 2010 7:48 AM PST up reply actions 29 recs
That's fair; tone issues will always get in the way, but
I’d think we need to make clear that people should feel comfortable trying to understand a study like this, and part of that is trying to think of things that might skew the results. In this case, Matthew had accounted for most all of them, and the remainder are unlikely to add anything worthwhile (esp. in comparison with the time they’d take… the Turkenkopf article accounts for umpire and not a whole lot happens when he does). But what if it was something significant? Yes, yes, you can point that out without being aggressive or presumptuous, but I presume it’s important that people point such things out. Right?
Wow, if this is the standard
Then LL is a lot more unfriendly to comments than I thought. You guys do some awesome sabermetric work here; I would think you would want it to be discussed. If the standard is that we have to show extreme deference to the author and not make any critical comments or questions, that makes it tough for those of us who are generally lurkers and not regular accepted members of the LL community to participate in any meaningful fashion.
Nick’s tone was very balanced as far as I can tell. He went out of his way several times to compliment Matthew’s work and raised a number of legitimate concerns without becoming personal or negative.
I guess I’d like to know what the standard for comments by visitors is. I am regular visitor to the site, but not a regular commenter nor a Mariner fan. I have occasionally dropped a comment in a thread here or there, but perhaps I’ll refrain in the future.
Just to clarify a bit..
The way your post (and Matthew’s response to Nick) reads, Graham, is basically that you sabermetric guys can kiss off, this is a Mariners blog and we want our work judged to that standard (at which it admittedly blows the socks off anything). We don’t want you all dragging sabermetric discussion in here because the clientele don’t appreciate it.
That’s fine, but if so, state that up front. And if so, I think it’s a shame but recognize that it’s your blog and your choice.
Being complimentary in one sentence and then massively disrespectful in the next does not make for a 'balanced' tone
This is Matthew’s house, and it’s completely possible to ask questions and raise issues without coming across like a belligerent know-it-all. I fail to see how a chastisement over poor tone is anything close to,
If the standard is that we have to show extreme deference to the author and not make any critical comments or questions…
by Graham MacAree on Feb 2, 2010 3:08 PM PST up reply actions
I think it's the "This is Matthew's house" attitude
Clearly, you feel that way, and that’s fine. Rigorous sabermetric back-and-forth can take place elsewhere, I guess. If there has to be deference to somebody, then it’s no longer about evidence.
If saying “you need to make some adjustments before you can use the results” is “massively disrespectful”, we clearly have different standards. I take similar comments from Peter Jensen, Alan Nathan, Tom Tango, MGL, etc., all the time on my work. I consider that constructive criticism and don’t worry about whether I’m in my house (THT) or their house (the Book Blog).
What I’m saying is that Nick took a tone in first post that is very, very common at other sabermetric outlets and fosters the discussion there. Restricting that kind of “tone” is going to lead to me bowing out of making substantive comments here because I don’t to worry about an author getting their feelings hurt. I agree that the discussion went south after Nick’s initial post, and that’s unfortunate, although Nick is hardly the only one at fault there.
I know that there are some blogs where you can’t question the work or comments of certain people. I didn’t realize that this was one of them.
Really?
You’re questioning my comments here and I’m hardly eviscerating you for it. Clearly we have different standards for ‘constructive criticism’ and ‘being a jackass,’ and if that means you’re unwilling or unable to participate in the discussion here, that’s a shame. But the standards here are what they are, and we expect people to be able to modify their tone for their setting.
The default response should not be ‘You didn’t do [x], you need to’ but rather ‘I didn’t notice you including [x]. Why is that?’ One is a command, the other is a question that could well lead to constructive criticism.
by Graham MacAree on Feb 2, 2010 3:28 PM PST up reply actions
That's a helpful distinction
because it doesn’t rely on ‘turf’ or territory or whatever. I’ll admit that I didn’t know how to interpret that in your comment, and clearly Mike didn’t get it either. If it’s ‘Be respectful and polite’ then I think everyone would generally know how to act.
The 'turf' thing is mainly a metaphor for audience expectations and knowledge
Is it ok for me to go over to VEB or whatever and go on my ERA scale rant? I don’t think so – seems pretty rude to me to wander into someone else’s audience and start talking down to people.
by Graham MacAree on Feb 2, 2010 3:51 PM PST up reply actions
I think that the posters at VEB would agree with you 100% on the ERA scale thing
And I would personally love if you or others would come to VEB and comment on the details and methodologies of the more stats inclined posts. That’s one of the reasons I come here, is that I love to talk about sabermetrics and there is a lot of related discussion here.
I agree with Mike that the “turf” thing is somewhat unusual, for lack of a better word. I comment on a ton of blogs that aren’t my “own” (The Book Blog, BCB, DRays Bays, BtB) with posts very similar to the one that started this thread, and I haven’t ever been chastised for it.
I don't feel as though I have earned any respect from your community
I may have the requisite expertise to argue with the best of them, but there’s no way I can take my LL tone abroad. People there don’t know me, they have no reason to trust me, and if I talk like I do here elsewhere there’s no way they’ll ever want to discuss anything with me. A lot of the way we talk is based on familiarity, and I feel quite strongly that if you’re in unfamiliar territory, err on the side of respectfulness.
Of course, that could be because my natural tone is extremely annoying and I have to remind myself to be nice lest someone internet-punch me in the face or whatever.
by Graham MacAree on Feb 2, 2010 6:46 PM PST up reply actions
From my limited experience in seeing you on other sites (mostly BtBs and McC)
I’ve found your tone more welcoming and even keeled (if that makes sense) than sometimes happens here. If I recall correctly, you had a somewhat fruitless argument at McC sometime last season regarding the usefulness (or uselessness more like) of WHIP and though people were being dense/wrong in responding to you, you remained very calm and balanced in your responses.
I’m sure any community would welcome such an attitude.
Aaron King is still my homeboy... iffy mechanics and all
McFAQ for all you newcomers out there.
I don't mean that you don't let people disagree here
What I mean is that you apparently don’t want the tone that is typically used at other sabermetric outlets like THT and the Book Blog. The “I really like what you did here, but you didn’t consider X, and for this work to be generally applicable, you’ll have to do that.” Or you neglected to think about Y. Or even, you’re wrong about Z.
In the sabermetric world, Nick and I are Matthew’s peers. If we can’t act like peers here, then it changes the forum to something where academic and nuts-and-bolts discussion is harder. In those types of forums, what Nick said is not taken as a “command”. Maybe the typical sabermetric tone doesn’t translate well to typical LL tone, but it’s very unfortunate, IMO, that that resulted in Nick getting jumped on for what are considered very tame comments and typical tone elsewhere.
By that I don’t mean that getting personal and rude with people is cool. I just don’t see that Nick did that (until subsequent posts where Matthew and others had dished back to him).
There's a pretty crucial distinction between here and THT or the Book Blog
It’s the audience. Matthew is essentially giving a seminar to a group of non-experts. My going back into bioengineering courses to have a shouting match with the lecturer would be interesting to me (and perhaps the lecturer), but of virtually no use to anyone else, who will just think I’m some random asshole instead of an expert in the field.
Yes, that does make it more difficult to have a no-holds-bared back and forth, but I don’t think it’s impossible. We’re all smart enough to recognise issues if they’re brought up, but it can be done much more respectfully than the ‘standard’ tone.
by Graham MacAree on Feb 2, 2010 3:49 PM PST up reply actions 2 recs
And let it be said that we're notorious for having a bad tone here ourselves
Especially me.
But then again, so does the rest of the sabermetric community. We’re making an effort to overcome our own issues at LL, and it is my hope that this is seen as a good thing rather than a bad thing. We’re more than happy to have visitors join us, and clearly the more expertise, the better. VEP’s been around a lot and he’s more than welcome to stay. You’re of course very welcome to join us too. But this isn’t Matthew, VEP, me and you at a bar (where according to my academic experience, the real work gets done), so if we can adjust our dialogue to match I think everyone will be happier.
by Graham MacAree on Feb 2, 2010 3:59 PM PST up reply actions 2 recs
Hey Graham
I realized while thinking about this on my drive home from work that I was missing some key thoughts from what I wanted to say as well as thinking about this a bit more from your perspective. I’ll chime back in later this evening when I have more time to write, but I appreciate very much your willingness to engage me on this.
This is ridiculous.
I responded to Nick’s concerns and then asked Nick to afford me the respect that is due anyone, peer or not, of questioning the author before making assumptive criticisms.
The tone Nick took in no way fosters discussion, I don’t care where you are. Other viewpoints foster discussion. Valid criticisms foster discussion. Questions foster discussion. None of those need to be coached in assumptions or condescension to be effective.
It has nothing to do with peer level, academic status or even where you are. It’s basic decency and respect. It’s common courtesy. And insinuating that asking for a more polite tone is equivalent to censorship is flat out insulting.
by Matthew on Feb 2, 2010 4:11 PM PST up reply actions 2 recs
Matthew
I am not intending to be insulting. I’ll write more later when I don’t have a baby in my arms.
I know you're not, Mike.
I’m just stating how it comes across to me.
So, I've been thinking about this
And wondering why I cared so much about it.
(I’m responding here to Graham and Matthew, so I won’t use the second person any more even though I’m responding to your post, Matthew.)
Lookout Landing has gone from being a site that I visited on occasion for some good reads to being one of my few visit-every-day baseball sites. You guys continue to put out top-notch work over and over again. It’s a pleasure to read and a great place to learn. Matthew, Jeff, and Graham are all very sharp cookies.
I was particularly excited to see the topic of catcher defense discussed by Jeff last week, and I enjoyed some of the comments to his post. Then I see that Matthew has another post about catcher defense today, and it is a really good one. Imagine my excitement.
I’ve been pondering for a while how one could go about measuring catcher defense. It’s not an easy problem. But surely with the tools we have or are starting to get, there would be a way to do it. So I really like to see posts like these two on game calling and pitch framing. Even if we’re not going to get all the answers to start with, I really enjoy having the thoughts in my brain on the topic stirred around and having new seeds of ideas planted.
When I read Matthew’s article, I had some of the same questions that Nick had. So I was disappointed to see, rather than a discussion based on the points he brought up, an argument ensued about his tone.
I do think he could have been more polite or phrased things differently. But I didn’t think he was trying to be rude or condescending. I guess part of that is because I know Nick and know how he likes to dissect the work of others into pieces so that he can figure out what he can do to extend or modify it. And I read his initial comment in that light. I didn’t take it to mean that he was finding flaws with Matthew’s article. On the contrary, I saw him thinking out loud about what one might need to do to make a catcher framing metric from Matthew’s work.
Now, I also understand a bit of Matthew’s frustration with Nick’s comment. I don’t know how many studies I’ve done where the first comment someone made was a list of things I should have looked at. My initial response (in my head, hopefully) to that is often something like this: “You moron, you know I do this as a hobby, right? And I considered about fifteen things you haven’t even thought of yet. Not to mention the fact that I have to write this article to entertain a broader audience, not as your personal research assistant.”
But then I (hopefully) remember that I probably wouldn’t have gotten a comment like that if the person didn’t like what I wrote and it hadn’t provoked them to think deeper about the topic. So then I try to take that as a compliment and respond to the substance of their ideas and assume that they probably weren’t thinking any of the insulting things I felt like they had communicated.
So, all that to say that I really would like to see this topic go farther in discussing the substance of Matthew’s ideas. This comment thread may not be the best place for that. After a little reflection, I think I understand Graham’s comments about tone a little better. Particularly the analogy about what is appropriate in a lecture hall vs. a bar. And I know that a community that is made up of a wide spectrum of fans takes some work on the part of you guys to lead that in the right direction, and I actually think you have done quite a good job of that here at LL.
So, I apologize for my earlier “I’m gonna take my ball and go home” comments. I was upset at a topic that is near and dear to me being derailed, but that was not really an appropriate response on my part.
I’m hopeful to be a good contributor here as I am able.
by Mike Fast on Feb 2, 2010 6:56 PM PST up reply actions 7 recs
I wasn't trying to be condescending with my original post
You are correct that I did assume that you did not consider the variables I mentioned, and that may have seemed like I was not giving you the respect that you deserve, but like Mike says, it really is the common tone on a lot of other blogs. I didn’t mean to disrespect you, you are an excellent analyst.
I was a bit taken a back by your reply to me, and I responded more hastily than I would have liked too. I apologize for my second post in reply to you.
This is a completely erroneous comment. I mean completely.
You can raise an issue without insulting the intelligence of the person you’re replying to with your tone, is the point. People do it all the time here.
I'm sorry
My comments were not intended to be condescending or agressive, I was just trying to highlight some possible issues with the study. I probably should have asked Matthew first before assuming he didn’t consider them.
At any rate, I’ve made some 700 comments on LL, and read the site daily. Almost all of them are on technical matters, and I try to be constructive. Your position on “home turf” is somewhat new to me, I’ll admit, but I can see you believe it (and judging by the rec’s on this post, so do many others!), so I’ll respect that in the future.
That is great, and I mean that, but...
Keep that in mind when you respond to the casual posters as well here; they are not statisticians, they are your audience, they’re the people who ultimately need to be convinced to follow your viewpoint (before they trust your position of authority.) So explain your thoughts before you try to take an authoritative position, and treat those who largely share your viewpoints with the same respect you would expect from them.
by OlSalty on Feb 2, 2010 7:00 PM PST up reply actions 2 recs
In case you missed it...
…Matthew specifically replied to you saying he looked at the factors you’re mentioning between Johnson and Johjima. He did not see fit to include them in the blog post because they proved to be, in his estimation, non-factors. There appeared to be an equal distribution of hitter handedness, pitch types caught, et cetera.
He didn’t say it in the original post, but the first time you raised this concern, he assured you that he did his “due diligence.” Now you’re just being argumentative for no reason. There’s no noise involved here. Matthew’s analyses of Johjima and Johnson is sound.
By the way, the BTB article you linked to shows RAA/150 pitches caught. Johjima ranks at -0.63, or -42.67 per season. Johnson, theoretically, would plot at about -0.67, assuming he is 2 runs worse than Joh; thus he still falls within the distribution. Problem solved. Don’t be lazy.
by harkening on Feb 2, 2010 2:42 PM PST up reply actions 1 recs
Funny you should mention that
I’m in the process of working on that followup right now.
But there are some crazy park effects that foul the whole thing up. So Matthew’s look at Johjima and Johnson might be more correct.
The first thing you’ll see from me, perhaps this week, is the result of the parks study. I think that one’s going to be pretty controversial, especially when you see my headline.
by Dan Turkenkopf on Feb 2, 2010 7:13 PM PST up reply actions
Could Uniformity be the difference?
What I notice in the two gif’s is the difference in how the pitches are grouped. Rob seems to have a more uniform, or consistant zone in the “heat” chart and has less outliers in the scatter plot. It could be possible that Johnson’s ability to frame a pitch results in a more consistant zone. If nothing else I think it can be assumed pitchers like consistancy, hence why they may prefer Johnson behind the plate.
I am not quite sure what you are saying here,
but the locations recorded by pitch f/x are where the pitch itself crossed the threshold of the plate, not where the catcher framed it. Any groupings in the scatter plots are the result of the pitches handled.
more or less, I’m saying is that what we might be looking at is a graphical repersentation of a pitchers confidence, based on who the catcher is. Uniform was a bad choice of words, but pitches seem around the strike zone more in Johnson’s graphs than in Kenji’s.
It could also just be that Johnson caught pitchers with better control last year (Felix comes to mind) than Kenji did.
screw my worde choice today...
*ptichers control, not confidence.
Although increased confidence could lead to increased control.
Sorry for the double post
Great work!
This was exceptional and very informative. Thank you.
Very cool stuff
Looks like a lot of work. It would be cool to see a league-wide study like this to get a better feel of stuff but that’d probably take forever.
Its also pretty interesting how different the pitch distribution between Kenji and Johnson looks.
I feel like I agree with your final conclusions where framing probably is a pretty small contribution. I have to figure helping to call/locate pitches must make a pretty big impact but that is super hard to test since two people are deciding what to throw and not just one.
It would be awesome if this could be automated for a broader use
so that individual catchers could be compared to the whole body of MLB. Something in the back of my head is telling me that there is something intrinsically wrong with that idea, so maybe that wouldn’t be as awesome as I’m thinking. Either way, it would be good to automate it once it is refined so that it each pitcher could be compared to the MLB average.
Creative analysis.
Is this all original? I don’t think I’ve ever seen a study like this.
What the hell is with the pitch on Kenji's scatterplot at ~(1.5, 1.25)? How on earth does that get called a strike?
It's hard to convince people to let you eat them if you're an asshole. - Thingray
If I had a brain cramp that big I'd probably die from an aneurysm.
It's hard to convince people to let you eat them if you're an asshole. - Thingray
Yeah, but Tschida is not like most folks.
(I only named him because he is not my favorite ump, I actually don’t know who called the pitch in question. It wouldn’t surprise me, though, if it was him.)
I was gonna comment on that.
If you measure framing by which catcher lets their pitchers get away with murder, Kenji’s got more and greater outliers by far. Look at all that shit at (-1.7, 2.5).
De Gutibus non disputandum est
by Bearskin Rugburn on Feb 2, 2010 9:19 AM PST up reply actions
Also Kenji at ~(.25, 2.5), wow.
I go to law school. Therefore, I have no life.
by andrewgolfsalot on Feb 2, 2010 9:48 AM PST up reply actions
This one completely baffles me.
Anyway of tracking back which pitch this was?
I think those are probably Pitch f/x errors
There are a non-trivial amount of times that the Pitch f/x operators attach the wrong pitch description to the pitch data. Most of the time, this is what happens with the called strikes that are clearly balls and vice versa.
by vivaelpujols on Feb 2, 2010 11:58 AM PST up reply actions
I wonder how much influence
“pitcher mix” has on these results. In the same way that it affects CERA, who you happen to catch may influence which strike calls you get. For example, if you catch more Felix, then you may get more calls at the edges of the strike zone because he’s better at nibbling the corners than a guy with less control.
Given the (unexpected, to me at least) influence of pitchers on base stealing, perhaps pitchers similarly have a major role in determining which balls and strikes get called? This would be tough to look at; I think that you’d probably want to a build a pitcher factor into your model which predicts strike probabilities, since stratifying by pitcher is probably going to run into sample size issues.
I think this would be more of an issue with umpires influenced by reputation...
I imagine Jamie Moyer in his prime probably got the benefit of the doubt on a lot of calls out-of-the-zone.
Greg Maddux Seemed To...
And I can recall many times hearing the talking heads mention in a game with a young pitcher vs. an established “control” pitcher that the young guy wasn’t getting the same calls. That’s just anecdotal though.
Pitchers with more control will historically be favoured by the umpires
But part of the reason they are perceived have more control in the first place is because of accidental umpire bias.
by Graham MacAree on Feb 2, 2010 4:14 PM PST up reply actions
Looking at those heat maps
I can’t help but think we should cover Rob Johnson in mud.
by jtopps on Feb 2, 2010 11:50 AM PST reply actions 1 recs
Why do you blame Rob Johnson for people's erroneous perception of Joh's framing?
I mean, Johnson isn’t a very good player but I fail to see how this article makes you hate him more. Unless you still think that Joh is terrible and take this to mean that Johnson is even more terrible.
De Gutibus non disputandum est
by Bearskin Rugburn on Feb 2, 2010 12:46 PM PST up reply actions
Just a reference to the movie Predator
nothing more, nothing less.
Well now I feel like a bag of assholes
De Gutibus non disputandum est
by Bearskin Rugburn on Feb 2, 2010 4:33 PM PST up reply actions
No worries, mate
Wasn’t that good of a joke/reference anyway…
Wow
I’ve been reading LL for a year now and this is the first time I’ve been so compelled as to comment. This is really cool analysis.
Thank you.
Matthew, this is great work.
I wonder if it would be possible for you to plug a guy who is generally perceived as being an ace catcher into this equation to see how they compare to RJ and Johjima. Maybe Varitek or Yadier Molina. I’d just love to know if the difference remains negligible, or whether it’s a skill with a broad distribution but you happen to be looking at two similar players.
De Gutibus non disputandum est
by Bearskin Rugburn on Feb 2, 2010 12:49 PM PST reply actions 1 recs
Beyond the Boxscore did this...
…and in between acting like a complete asshole, vivaelpujols actually linked to it above. The data is from 2007, but the post can still be found here.
And saying that, I’m running the risk of being perceived as an ass, too. Longtime reader via USS Mariner, but this is my first time commenting as I just joined the LL community a week or so ago.
Matthew—
This is great stuff.
Jebus, you don't need to walk on pins and needles because you're linking to a relevant study
It’s fascinating stuff, and while I can’t imagine why the effect is so large, it’s still a fascinating study.
Short version: Kenji was one of the worst in baseball. If that’s true, and if the magnitude of the effect is reasonably large, then you could understand why the M’s might want to try Johnson even if they thought his hitting and CS% would cost runs.
However, Matthew’s study shows a slight problem with that……
Not pins and needles...
…but in referring to vivaelpujols as an asshole, I was breaching an expectation of civility. The link originally was his (see above). I wanted to head that off early.
BTB’s study was for 2007, during which Kenji was catching Miguel Batista, ChaSeung Baek, Ryan Feierabend, Jeff Weaver…in other words, not a great staff.
It makes me curious if Kenji’s numbers would be signfiicantly different when using Matthew’s data here, given the staff changes. In other words, is the framing debate as such affected by pitchers caught. I know Matthew found this to be negligible in terms of Johnson/Johjima—to be expected; it’s the same pitching staff they’re catching—but across the game, it’s a curious question.
It's not in the term
it’s in how it is used.
Not so good example: You (insert name here) are an asshole.
Acceptable example: You (name) are acting like an asshole.
Your usage was the latter.
Yes
That’s one thing I’ve learned over 15 years of reading BB’s and blogs — stay away from direct insults to a person. It should be perfectly acceptable, however, to opinionate on the content of their post, even if it comes off as critical or insulting. Unfortunately, that sometimes leads people to feel that they are being attacked personally. But it’s quite often difficult to have meaningful discussion without criticism of a person’s content or attitude. Keep the focus on the content, not the person.
by nathaniel dawson on Feb 2, 2010 4:56 PM PST up reply actions
Whoa, hey, thought I had a post here...Anyway, good point about the staff
but as Graham points out above, just the strike/ball calls probably result more from umps not quite knowing the ‘real’ strike zone as from ‘this pitcher is terrible, therefore I won’t give him the corner.’ And anyway, as Turkenkopf controlled for umpire bias, I’d think that much of the former type of error would be eliminated.
I’m still stunned the effect was as large as he’s reporting, but it certainly gives you an understanding of why the M’s would turn to Johnson over Johjima, despite the offensive gap (and defensive gap in more measurable categories!). Of course, Matthew’s research points out that they may have replaced a terrible pitch-framer with… an equally terrible pitch-framer.
(Oh, and in 2007, Batista was decent, and Cha-Seung Baek was always decent but criminally underrated)
I would like to see Dan's study repeated with 2009 data
I think the 2007 data had a lot of errors (plus a smaller sample size) and that might have been part of the reason that there was such a wide spread.
I would like to see someone look at it the way Dan did, while controlling for all of the factors I mentioned in my first post here. I’m not sure I have the technique expertise to pull that off without massive amounts of inefficient work and coding (and I don’t even have catcher data set up with Pitch f/x), but I think it would be good if someone did a really careful study on this issue.
It just defies belief
that the difference between the best and worst catchers due to framing is 25 wins. WINS!
Even with a relatively small sample size, it’s hard to explain results like that without thinking there’s something seriously questionable about the method.
I don't think there was neccesarily anything drastically wrong with the method
There were some variables left unaccounted for, but nothing too drastic I don’t think. It’s probably more to do with the Pitch f/x data quality.
I should have added
“… or something seriously questionable about the raw data.” I think Dan may have addressed this, above.
Yeah... remember Jeff's comment about weirdness in LA data in Graham's post about Kotchman
That is, that the entire Angels team had fewer than 40% of pitches thrown to them classified as strikes in 2007…. that must be some of the ‘park effects’ Dan mentioned. That might explain why Jeff Mathis, say, got ‘credit’ for getting more pitched called as strikes, though it wouldn’t explain why Kenji Johjima looked so bad (the M’s 2nd worst in strike percentage by their pitching staff).
I don’t know, but I’m curious to see what Dan found.
I'm pretty sure Jeff was referring to FanGraph's plate discipline stats
IIRC, it was Zone%. That uses BIS data which is separate from Pitch f/x.
by vivaelpujols on Feb 2, 2010 11:19 PM PST up reply actions
Ah, got it.
The correlations there would be interesting…
Must be the smog.
Mariners/D Broncos/BSU Broncos fan in Seattle
by appleshampoo on Feb 3, 2010 10:34 AM PST up reply actions
I don't have my data with me, but for example, Texas has consistently had a Strike Factor of 62
That means umpire “misses” are strikes 38% more often than average in Texas. And that sort of pattern plays out in a bunch of parks.
by Dan Turkenkopf on Feb 3, 2010 6:08 AM PST up reply actions
See, that's fascinating.
Do the BIS data generally line up with the pitch-fx data for these outlier parks (like Anaheim in ’07)?
This is neat.
Would swinging strike percentage and pitchers play a role in this too, when you are comparing onto two players?
...and now I'm here
Not sure how SwStr% would play a role
Pitchers probably, on the basis that better pitchers tend to get friendlier calls, but that would only benefit Johnson even more and I don’t believe it would have a particularly large impact given the limited nature of this comparison.
My thought was that it would play a role due to the pitchers on this particular team.
For example, if Rob Johnson catches Felix, and no one can hit the royal curve (which I assume would drop low in the zone), they would probably swing and miss or it would be called for a ball…. I don’t have a complete thought on this, only that perhaps there is a difference when looking at only two players.
...and now I'm here
I think I see what you're getting at,
but would that affect whether those curves that were laid off by the hitter be called a ball or a strike?
That I have no idea.
And I would guess no, but at the same time it would be tough to tell. Say Felix has one particular pitch (the curve) that is often called a ball because of the huge drop, and only Johnson catches it, that could affect the numbers. If Kenji always caught Jakubauskus, who always pitched in a straight line, umpires would be more likely to get the call right. How that relates to SwStr%… You know, to be honest I have no idea. Maybe just that players are more likely to swing at the pitches that were more likely to become strikes. Really, I suppose the question is whether there is too small a sample size of pitchers themselves, rather than pitches thrown. Arguably that would make creating a statistic for framing even more impossible though.
...and now I'm here
Did some of the comments get lost or something?
Aaron King is still my homeboy... iffy mechanics and all
McFAQ for all you newcomers out there.
Matthew, can you clarify something for me?
You said, “Secondly, that difference was minuscule, even before adjusting for any possible regression.” Am I reading too much into this, to think that you’re saying the difference should decrease, rather than increase? Isn’t it just as likely that the difference between most catchers is larger than Joh v. Johnson 2009, rather than smaller?
He is only discussing the difference between Joh and Johnson in that statement
He is saying that he calculated the difference between then to be small and then when you regress that result, it gets even smaller.
by Edgar for Pres on Feb 2, 2010 7:17 PM PST up reply actions
How can you regress the result?
There’s only one data point: 2 runs.
Either I don’t understand something about regression (likely), or you’re making a big assumption in saying that the difference will get smaller. I’m trying to figure out which is the case.
Because regression is toward the league mean,
whatever that would be in this case. And since both catchers had a similar sample size from 2009, simplistically, no matter what the league average is, the difference between the two of them would get smaller.

by 

















