Problems that have unsatisfactory answers dig at my brain and flip my mental switch from furious multi-tasking to single-minded obsession. If I get my claws into something, I cannot stand to just have the answer, I have to fully understand the reasoning and derivation of the process behind arriving at the answer. It makes me very peculiar about which science programs I will watch for instance. It also led directly to all the statistics work I have done on baseball; all because I wanted to know how bad Joel Pineiro's 2006 swinging strike rate was.
Lately, that nagging challenge has been defense. It began here and motivated me to construct batting average on balls in play (BABIP) on a runs scale in my database so that I could use it in my series previews. That was not intended to be complete, but it kept eating at me how to make it better and hence this post.
What I had looked at was simply just a team's overall defensive BABIP versus the league average. However, that omits the reality that defenses do not all face the same number and types of batted balls. Case in point, Angel pitchers have a low line drive rate. Since line drives have the highest BABIP of any type, looking only at overall BABIP overrates the Angels defense because they had fewer line drives to contend with. We want to credit the pitchers for suppressing line drives (which tRA does), not the fielders.
Therefore, what I have done now is separate all the batted ball types and look at each team's rate of converting them into outs against the league average for that isolated type. This way, it doesn't matter how many line drives the Angel defenders have seen, but instead how many of them they turned into outs compared to how many the league did. The other adjustment I made was to include errors as BABIP ignores that aspect. Again, that makes sense from a pitching perspective, but not a defensive one and was a glaring omission on my part earlier.
|Defensive BAbip+Errors by Batted Ball type|
These two changes had a profound effect on the final numbers. Consider the Angels and Mariners. They have nearly identical defensive BABIPs but the Angels have a line drive rate about five points lower than the Mariners. Defensively, that makes the Mariners' BABIP more impressive than the Angels' equal BABIP overall. Much more, as it turns out. While I used to separate them by about half a run, they are now an amazing 42 runs apart!
I will throw a more math-heavy explanation of the whole process below the jump, but for now here's some information on how the Mariners fare. Compared to the league, the Mariners have the 4th best BABIP on ground balls, 9th best on fly balls, 22nd best on line drives, 14th best on pop flies and 8th best on bunts. Altogether, they end up 5th best in the league with a +19.8 run rating.
A = team hits + errors allowed on ground balls in play
B = team number of ground balls in play
C = league hits + errors allowed on ground balls in play
D = league number of ground balls in play
E = run value of ground ball hit in play
Then the value for each team on each batted ball type is given by: (A/B - C/D) * B * E
E was derived in a two-step process. First, I needed the average value of each type of hit. Using a Markov calculator and adding and subtracting hits of each type, I arrived at the following values:
Single = 0.8 runs
Double = 1.1 runs
Triple = 1.5 runs
Home run = 1.8 runs
Next I had to figure out how often each batted ball type went for each type of hit, given that it did go for a hit. For ground balls, 92% of hits are singles, 8% doubles, a fraction for triples and an even smaller fraction for a home run. Then it's just matching up those odds with the values above to get the average run amount of an extra ground ball hit (roughly 0.83 runs). This was repeated across all batted ball types.