Navigation: Jump to content areas:


Pro Quality. Fan Perspective.
Login-facebook
Around SBN: Win or Lose, Boston Celtics' New Big 3 Era A Success

More on Defense

Sexy defense

Problems that have unsatisfactory answers dig at my brain and flip my mental switch from furious multi-tasking to single-minded obsession. If I get my claws into something, I cannot stand to just have the answer, I have to fully understand the reasoning and derivation of the process behind arriving at the answer. It makes me very peculiar about which science programs I will watch for instance. It also led directly to all the statistics work I have done on baseball; all because I wanted to know how bad Joel Pineiro's 2006 swinging strike rate was.

Lately, that nagging challenge has been defense. It began here and motivated me to construct batting average on balls in play (BABIP) on a runs scale in my database so that I could use it in my series previews. That was not intended to be complete, but it kept eating at me how to make it better and hence this post.

What I had looked at was simply just a team's overall defensive BABIP versus the league average. However, that omits the reality that defenses do not all face the same number and types of batted balls. Case in point, Angel pitchers have a low line drive rate. Since line drives have the highest BABIP of any type, looking only at overall BABIP overrates the Angels defense because they had fewer line drives to contend with. We want to credit the pitchers for suppressing line drives (which tRA does), not the fielders.

Therefore, what I have done now is separate all the batted ball types and look at each team's rate of converting them into outs against the league average for that isolated type. This way, it doesn't matter how many line drives the Angel defenders have seen, but instead how many of them they turned into outs compared to how many the league did. The other adjustment I made was to include errors as BABIP ignores that aspect. Again, that makes sense from a pitching perspective, but not a defensive one and was a glaring omission on my part earlier.

Defensive BAbip+Errors by Batted Ball type
OrgGBFBLDIFBT
Seattle .239 .160 .726 .016 .364
MLB .257 .174 .719 .020 .414

These two changes had a profound effect on the final numbers. Consider the Angels and Mariners. They have nearly identical defensive BABIPs but the Angels have a line drive rate about five points lower than the Mariners. Defensively, that makes the Mariners' BABIP more impressive than the Angels' equal BABIP overall. Much more, as it turns out. While I used to separate them by about half a run, they are now an amazing 42 runs apart!

I will throw a more math-heavy explanation of the whole process below the jump, but for now here's some information on how the Mariners fare. Compared to the league, the Mariners have the 4th best BABIP on ground balls, 9th best on fly balls, 22nd best on line drives, 14th best on pop flies and 8th best on bunts. Altogether, they end up 5th best in the league with a +19.8 run rating.

Star-divide

Define:

A = team hits + errors allowed on ground balls in play
B = team number of ground balls in play
C = league hits + errors allowed on ground balls in play
D = league number of ground balls in play
E = run value of ground ball hit in play

Then the value for each team on each batted ball type is given by:  (A/B - C/D) * B * E

E was derived in a two-step process. First, I needed the average value of each type of hit. Using a Markov calculator and adding and subtracting hits of each type, I arrived at the following values:

Single = 0.8 runs
Double = 1.1 runs
Triple = 1.5 runs
Home run = 1.8 runs

Next I had to figure out how often each batted ball type went for each type of hit, given that it did go for a hit. For ground balls, 92% of hits are singles, 8% doubles, a fraction for triples and an even smaller fraction for a home run. Then it's just matching up those odds with the values above to get the average run amount of an extra ground ball hit (roughly 0.83 runs). This was repeated across all batted ball types.

Comment 89 comments  |  42 recs  | 

Do you like this story?

Around SB Nation

Yankees at Angels Preview

Jun 2011 from Pinstripe Alley - 265 comments

Play Our Angels Nickname Game

Jun 2011 from Halos Heaven - 320 comments

Comments

Display:

This is great

anything that gives us a different perspective on defense is worth having. Quantifying defense is quite difficult so I, and most everyone who watches the Mariners, knew there was no way the M’s defense is as bad as UZR indicates.

[link]http://intoxiwinningsports.tumblr.com/[/link]

by superkid20 on Jun 16, 2011 9:27 AM PDT reply actions  

The M's total defense for the season is as bad as UZR (and DRS) indicates

As it was awful early, especially when the M’s ran out the Bradley, Langerhans, Ichiro outfield. There were many reasons for the horrible start but defense (or lack thereof) definately played a large part in it.

The defense has basically been better than league average since then, which is what we see when we watch games now.

by CMC_Stags on Jun 16, 2011 9:43 AM PDT up reply actions  

Interesting

besides CMC_Stags’ hypothesis being unsubstantiated hyperbole – would you be able to refute his position by offering a time scale – changes in time over the course of a season?

Law of Logical Argument
Anything is possible if you don't know what you are talking about.

by blacknoiseNW on Jun 16, 2011 11:24 AM PDT up reply actions  

You are the best!

I’m surprised this hasn’t been done before — it’s always seemed to me like the most logical way to do it. I don’t think you can measure individual performance this way, as it’s then prone to many of the same sample issues as other defensive statistics, but it’s a great way to measure team defensive performance.

Now you just need a catchy acronym and you can post it on statcorner!

"Satisfaction is the enemy of success." SanFranPreps

by perfectstrat on Jun 16, 2011 9:33 AM PDT reply actions  

I really like this.

At first glance it I can see how some people might think it’s too complicated. But really, this is pretty straightforward and simple.

by room13 on Jun 16, 2011 9:35 AM PDT reply actions  

I really like this approach

When I read your Defensive Conundrum article, my very first thought thought was that you should try separating batted ball types, so I’m glad to see that you’ve done this. Is there any chance we could see a leaderboard?

The possible existence of some types of pitcher BABIP skill could make things difficult (i.e. if certain types of pitchers (high-strikeout pitchers, extreme flyballers/groundballers, knuckleballers (ha!)) exhibit some sustainable difference in their component GB/LD/FB BABIPs), but you would hope that this would be a fairly muted effect on the team level.

by MangoLiger on Jun 16, 2011 9:45 AM PDT reply actions  

BT = Bunts?

MLB BABIP on bunts + errors is .414?! Are not enough players using this to get on base? Didn’t Michael Saunders use bunts as a good OBP tool last year? How many did he attempt this year?

by God of Biscuits on Jun 16, 2011 9:59 AM PDT reply actions  

That seems odd to me as well

Matthew, have you included sacrifice bunts and flies?

by Graham MacAree on Jun 16, 2011 10:14 AM PDT up reply actions  

Bunts no, flies yes

I do not consider a sacbunt to be a legitimate ball in play.

by Matthew on Jun 16, 2011 10:59 AM PDT up reply actions  

Obviously.

But as the intent of the hitter is changed and often the defensive alignment is shifted inward, it seems silly to me to group them in. As far as I know, nobody includes sac bunts in BABIP calculations.

by Matthew on Jun 16, 2011 5:29 PM PDT up reply actions  

Does this figure in ballparks?

My natural thought would be that parks would cause differences in the BABIP on the various types.

by Vegasexpat on Jun 16, 2011 10:00 AM PDT reply actions  

There is not explicit park correction,

but by looking solely at rates, there’s at least some correcting for parks/scoring biases. As for, say, the infield grass being longer/shorter in some parks and therefore affecting GB BABIP, no, I am not making an allowance for that at this time.

by Matthew on Jun 16, 2011 11:01 AM PDT up reply actions  

Does this assume every non home run hit ball is defendable?

I’m not sure how to articulate what I’m thinking and I know you know a lot more than I ever will, but how do you account for the hit it where they aren’t factor? Or is there consideration for hitters placing a ball out of a defender X’s known range? Or do my questions really matter?

Fuck the Angels

by InSpokane on Jun 16, 2011 10:04 AM PDT reply actions  

In general, BABIP doesn't include home run balls, even though Guti (and Carp) defend them.

Otherwise, I think all balls in play are accounted for. I don’t think batters (or inversely pitchers) have complete control to follow the Keeler Directive of hitting balls where the fielder ain’t.

by yuniform on Jun 16, 2011 10:14 AM PDT up reply actions  

I think a more interesting problem with a defensive metric like this

Is purely undefendable hits. Does a double hit off the top of the green monster count against the defense?

by wetzelcoal on Jun 16, 2011 10:16 AM PDT up reply actions  

I think what yuniform is saying

is that as long as a pitcher lacks control over where the ball goes, then the rate of undefendable balls should stabilize for all pitchers over a large enough sample size (within batted-ball types). If that’s true, then this is a pretty good way of figuring out relative quality of team defense.

by isaac_spaceman on Jun 16, 2011 10:27 AM PDT up reply actions  

Though obviously the park and strategic factors InSpokane says matter.

It looks like Matthew’s system doesn’t specifically account for them, as of yet.

by yuniform on Jun 16, 2011 10:33 AM PDT up reply actions  

Yes, it does.

And I don’t account for it. That’s what things like UZR and zone-based systems try to account for and by and large, I’m not sure they add anything through the subjective noise

by Matthew on Jun 16, 2011 11:04 AM PDT up reply actions   1 recs

Amazing data, absolutely amazing.

From this though, how can we improve our LD BABIP? Is this still inflated from the beginning of the season with Bradley/Langerhans/Ichiro OF? What has it look like in the last 30 days?

by EdgarBoneJr on Jun 16, 2011 10:20 AM PDT reply actions  

Those were my thoughts too.

In September, if the Guti-innings have dwarfed Langerhans’, then the numbers will have to look better.
The other categories give us some pretty numbers: between 7 and 20 percent better than league average. Intuitively, the results feeeeel right, which is a nice bonus.

by fiftyone on Jun 16, 2011 10:50 AM PDT via mobile up reply actions  

Seems to me LD BABIP is highly fielder independent

There’s a pretty small sphere of influence for a fielder on a line drive.

by timc on Jun 16, 2011 11:53 AM PDT up reply actions  

But a very large sphere of influence on the borderline catchable ones.

Granted, nobody’s going to catch a screamer fifteen feet above the second baseman’s head. The stuff in the gaps, however…

by fiftyone on Jun 16, 2011 12:48 PM PDT via mobile up reply actions  

This is what people should rec.

Not MS Paint images of stick people or puns.

by katal on Jun 16, 2011 10:31 AM PDT reply actions   4 recs

In other words,

Very awesome & impressive work, Matthew.

by katal on Jun 16, 2011 10:32 AM PDT up reply actions  

I don't want or mean for a conversation about recs to distract from Matthew's work

But I’d contend that rec’ing both types of posts makes them appear equal in quality, and I am not sure that should be the case.

by katal on Jun 16, 2011 10:36 AM PDT up reply actions  

If only we had an additional method

of using our words to express the appropriate type of appreciation. alas.

by Snuffleupagus on Jun 16, 2011 10:43 AM PDT up reply actions   1 recs

Thank you for this Mathew

this is the kind of work that makes me thankful that I’m a Mariners fan, and I can be exposed to this while reading about my favorite team.

However, I’m afraid that the interest in this kind of metric will probably be limitted because of its inability to measure individual players. It’s great to assess team defense in what appears to be a much more accurate manner, but as fans we like to point to the individuals inside that team who are top and bottom.

by Snuffleupagus on Jun 16, 2011 10:45 AM PDT reply actions  

This is awesome; thanks Matthew.

In breaking out balls in play into their individual components, did you notice any trends or correlations based on rates with respect to sheer volume?

For instance, did defenses that face higher/lower individual type totals compared to the league average turn them into outs at a considerably higher/lower rate than the league average?

by ThomasG on Jun 16, 2011 11:01 AM PDT reply actions  

It looks like groundball rates must dominate overall

I haven’t looked at the raw numbers, but GB numbers have to be pretty overwhelming for the M’s to end up 5th best in the league given that they’re 4th best on groundballs and 9th best or worse on everything else. I’d expect that to hold true across all teams, leaving anything that looks like a trend in the other batted ball types open to interpretation as mere short-term weirdness.

by J0SER on Jun 16, 2011 1:20 PM PDT up reply actions  

An issue wherein certain teams give up significantly more/less LDs

Take the Angels for example. We don’t want to give their fielders ‘credit’ for the batted ball distribution their pitchers have yielded, but we have to be pretty sure that the pitchers have ACTUALLY given up fewer line drives.

Looking at fangraphs, the Angels have consistently given up fewer LDs than just about anyone. From 2003-2011, they’ve got the 2nd lowest LD rate. Same if you look at 2003-2007 and 2008-2011 (they’re 2nd and then 5th) or pretty much any 3-year period in between. Either there are a lot less LDs hit there for some reason, against all manner of pitchers, or the stringers just ‘call’ fewer LDs there.

Their LD park factor on statcorner is 80 for LHB and RHBs. 80!

by marc w on Jun 16, 2011 1:38 PM PDT up reply actions  

I think the stringer issue is what Jeff was getting at

Some people have noted weird scorer biases occurring in certain parks, where LD% is lower or higher in specific parks than in others, which is annoying because LD% is pretty important and yet highly subjective (at least relative to a GB). One interesting question would be whether there’s a park effect on LD% or a scorer bias effect on LD% that suppressing (or increasing) LD% in a certain park.

by JLC on Jun 16, 2011 2:35 PM PDT up reply actions  

That's what I'm talking about!

I’m saying I can’t believe LD park effects are roughly on par with HR park factors. A possible explanation is stringer bias, and we’ve seen that corroborated by various studies – like the one Colin Wyers did at THT showing a relationship between LD% and the height of the press box.

by marc w on Jun 16, 2011 3:21 PM PDT up reply actions  

Yeah. In my mind, that is the final piece of the puzzle.

For example to those following along, consider this incredibly simplified example:


Say this is the overall average for batted ball buckets in MLB with the green line representing the average for the category. Now, we can confirm that in Anaheim there is a lower amount of line drives called. If that’s a park effect (something to do with the batter’s eye for example that makes it harder to square up the ball), then I don’t think it’s an issue re: this system.

However, if it’s because the MLBAM stringers in Anaheim have stricter standards for what a line drive is, then that is a problem. Anaheim’s batted ball buckets could end up looking like this:

By shifting the LD/FB line over to the left, you’d see fewer line drives and more fly balls, which matches the observed data. Consequently though, you’d also hypothetically see an increase in BOTH averages for LD and FB BAbip. Since it’s those averages that are in use here, it would make ANA look worse at fielding both LDs and FBs (and they do rate very poorly in those two categories), when in reality it could just be a byproduct of the classification bias.

The problems then are: how to tell the difference between a park effect and a scorer effect and if it’s a scoring effect, how to control for it. I don’t know the answer to either of those.

by Matthew on Jun 16, 2011 2:43 PM PDT up reply actions   6 recs

Or actually, I have an idea how to tell the difference between the two, but I'd need hit f/x data

so I can make actual versions of the graphs above. If the cutoff points between batted balls roughly aligned, then it’s probably a park effect (seeing fewer line drives). If they don’t match, then it’s more likely a scorer effect (seeing more balls at the 58° called fly balls instead of line drives)

by Matthew on Jun 16, 2011 2:47 PM PDT up reply actions  

What's your best guess as to the magnitude of an effect the batter's eye can have on LD%?

I have absolutely no idea, but I’m just stunned by the variance in LD park effects on Statcorner. ~79-80 to ~130+?

I suppose a better batter’s eye would result in better/more contact, but an 80 LD factor, especially when the HR factors are low-but-not-crazy-low (and the wOBA park factor is even closer to average).

by marc w on Jun 16, 2011 3:28 PM PDT up reply actions  

Pretty big MOE on LD factors (+/- 10) for 3-year samples I found.

And I have no guess because park factors are so inter-related. Small foul territory (park factor) boosts LD rate (by reducing IF rates).

While stuff like high outfield walls (scorer factor) probably boosts LD rate because fly balls that hit off the wall probably seem more like a ‘line drive’ to a scorer. (Actually this would make for an interesting test).

It’s all such a mess.

by Matthew on Jun 16, 2011 3:34 PM PDT up reply actions  

The high-on-the-wall thing would make an interesting test

but even if the scorers theoretically passed it, it’s so infrequent as to have little effect on the overall LD rates. It’s things like catch/no-catch and range bias that we really have to worry about.

by marc w on Jun 16, 2011 3:40 PM PDT up reply actions  

All I'm worried about it correcting it, and I don't know how.

Even I assume it’s 100% scorer bias, how do I adjust LD/FB totals and rates by, say, the 20% for ANA? I haven’t figured that out.

by Matthew on Jun 16, 2011 3:46 PM PDT up reply actions  

Couldn't it be done simply by comparing home/away statistics?

I’m not sure that, for statistical purposes, it matters whether LD depression is caused by the park itself or stringer bias; it’s still an effect on all measurements coming out of that park that needs to be corrected for. In that case, comparing the Angels in Anaheim to the Angels anywhere else (or any other team playing in Anaheim compared to their respective home parks) should give you the data needed to correct for it, right?

And once you’ve got that data, you can start separating out stringer biases from actual park effects through a multi-year analysis comparing fluctuations in LD rates to who’s scorekeeping from MLBAM (if their identities and games scored are available).

Or am I completely missing the point here?

by Tube on Jun 16, 2011 5:06 PM PDT up reply actions  

It does matter what it's caused by.

Comparing home v away is too noisy for my tastes. You halve your samples and home teams have a natural advantage in fielding (knowing the quirks of their park) so you’d need to establish some sort of home-fielding advantage.

But that’s not static year-to-year either because the home team gets new fielders, the park might change, etc. So you end up having to take half the data and neutralize for home advantage and then regress that because your drawing against a small sample, then apply it and then probably regress the whole thing again. It may be possible, but yuck.

We do not get the identities of the stringers.

by Matthew on Jun 16, 2011 5:37 PM PDT up reply actions  

That's a great visualization, Matthew.

I’ve thought of the same thing with regards to batted ball types that pitchers allow, i.e. not BABIP as the Y axis, but frequency of launch angle.

It also looks a lot like a field goal, but that’s probably just me.

by nathaniel dawson on Jun 16, 2011 7:20 PM PDT up reply actions  

Does MLBAM use Hit f/x data?

That seems like the next biggest step in correcting batted ball biases. Even then though, Hit f/x reportedly doesn’t account for spin on the ball. Are you in the group that thinks a stopwatch is the best way to correct for this?

"Satisfaction is the enemy of success." SanFranPreps

by perfectstrat on Jun 16, 2011 2:27 PM PDT up reply actions  

Hit f/x doesn't exist yet

and when it does, we won’t get it, reports about it are probably no longer accurate and I am not going to offer an opinion rooted in ignorance of the system.

by Matthew on Jun 16, 2011 2:29 PM PDT up reply actions  

Yes, there's not much reason to make Hit f/x or Field f/x data public.

That’s too bad, as we could do a lot with it. Think of the revolution pitch f/x has had on the game; it’d be great if we could come close to that with defensive evaluations.

"Satisfaction is the enemy of success." SanFranPreps

by perfectstrat on Jun 16, 2011 2:33 PM PDT up reply actions  

Is the title referring to what Yuni plays?

Seriously, however, this is an amazing article and I think it does a lot better job at explaining defense than UZR does.

by Mariner John on Jun 16, 2011 1:35 PM PDT reply actions   6 recs

This is (a really insightful use of data) slash awesome

Fantastic, Matthew. This is really eye-opening.

We'll always have 2001

by 116 on Jun 16, 2011 2:27 PM PDT reply actions  

And now, a question.

I have a question about the IP part of BABIP, sparked by your mention of park factors in your earlier post.

You talk about the fact that Oakland has lots of foul territory and thus should yield a lower BAPIP than some other parks.

If “In Play” only includes balls hit between the foul line, how do balls caught outside the foul line affect the stat? I only have a surface understanding of this, so please forgive any ignorance.

Mostly I’m wondering if BAPIP or some other stat has the ability to capture the defensive abilities of players (primarily outfielders, but others to a lesser extent) who can make that long run to foul territory, reach or dive into the stands, and come up with an amazing out catch.

Can anyone enlighten me?

We'll always have 2001

by 116 on Jun 16, 2011 2:42 PM PDT reply actions  

in play has nothing to do with the foul lines

It means defendable i.e. not over the fence (either fair or foul)

by Matthew on Jun 16, 2011 2:48 PM PDT up reply actions  

Ah. Just goes to show, don't just speed-read the first definition you google.

Damn you, hardball times.. Should’ve checked a few other references. Thanks Mathew.

We'll always have 2001

by 116 on Jun 17, 2011 1:39 AM PDT up reply actions  

I'm pretty sure DRS includes some adjustment for OFs reaching over walls.

Not sure about dives into the stands or anything like that, though. I mean, theoretically, it’s all about how the ball is scored by the BIS/STATS stringer. If they put the location of the ball 2 feet inside the stands, then a player making a catch there is going to be rewarded because 95%+ of all balls hit to that location aren’t fielded.
There may be issues with how accurately all of this happens.

by marc w on Jun 16, 2011 3:31 PM PDT up reply actions  

Epic. I take my hat off to you, sir.

I had been meaning to look into UZR etc at some point but I don’t think I’ll bother now.
Thanks for this.

by Aussie Mariner on Jun 16, 2011 4:44 PM PDT reply actions  

Math make brain hurt

Though it’s way cool that some people can actually get something out of this.

by Aly Edge on Jun 16, 2011 8:42 PM PDT reply actions  

Great article. This is much more satisfying than the early season UZRs.

I think it passes the intuitive sense “smell test,” so kudos on that as well.

by VivaAyala on Jun 17, 2011 9:30 AM PDT reply actions  

Comments For This Post Are Closed


User Tools

By reading a game thread of your own volition you agree to accept all liability for any and all damage done to your delicate sensibilities.

FanPosts

Community blog posts and discussion.

Recent FanPosts

Small
Starlin Castro's fit with Seattle
Kawasaki80_small
Lists! So many lists!
M_s_hat_copy_small
OT -- May 22nd In Memoriam
Ichiro_small
Why do managers and media members hate walks?
Wbc_029_small
Friday Morning Music Thread
Small
Dustin Ackley BP swing vs game swing
Beastquakerwallpaper_small
More on the Struggles of Smoak
Randy2_for_sbn_small
Albert Pujols 2012: Three Retrospectives
Small
On Batting Orders
Niehaus_small
More on Dustin Ackley and the strikezone

+ New FanPost All FanPosts >

Yahoo_full_count

Sexy People

Wbc_029_small Jeff Sullivan

Small Matthew

Claw_small JY