Lookout Landing: An SB Nation Community

Navigation: Jump to content areas:



Around SBN: Josh Wilson's Annotated Hit Parade Bar-right-arrows



Adjusted IP and Outs: More Pitching Analysis

As most of you know, I'm inordinately fond of adjusting a pitcher's runs given up by the outcomes a pitcher himself controls, namely K/BB/HBP/GB/OFB/IFB/HR. I'm going to assume everyone reading is familiar with my work on this here and here (and if you're not, I just linked it for you).

There's a problem with this approach, however, especially when we then translate it into a per 9 IP measurement: Although we've adjusted the runs the pitcher has given up we still haven't adjusted for outs, meaning that using their actual IP might give you a less than accurate result. Obviously, this is a problem for tRA, which is a R/9IP analogue.

How do we fix this? The same way we adjust for runs. Try to find out how many outs we should expect for each batted ball types, and then use those expected outs rather than actual innings pitched.

Star-divide

I use values as follows:

K: 1.00 (α)
BB: 0.00 (β)
HBP: 0.00 (γ)
LD: 0.26 (δ)
GB: 0.81 (ε)
OFB: 0.867 (ζ)
IFB: 0.971 (η)
HR: 0.00 (θ)

These numbers were derived from a THT article and a Replacement-Level-Yankees post. You'll note that the only values that are exactly the same as in the sources I cited are LD and IFB. This is because a) double plays aren't accounted for (i.e. the numbers in the links are likelihood of at least 1 out, rather than the number of outs) and b) HR are bundled in with outfield flies.

I got around the first problem by summing double plays over an average season (2006, to be specific, because all the other stats I use are from that season) and dividing by total grounders, giving an extra 0.078 outs per ground ball. The more observant amongst you will have noticed that this technique ignores the distinction between ground ball double plays and every other type - I did it like this because otherwise it would be impossible. I don't think the assumption that every double play comes off a ground ball will lead to much inaccuracy.

The second problem's just some pretty simple algebra:

[OFB%*ζ+IFB%*η]/[OFB%+IFB%]=0.79*[HR%+OFB%]/[HR%]

yields ζ=0.867 given rough league averages for all of those values.

So now we have expected outs per event, so we can sum up expected outs as follows:

xOuts = Batters_Faced*[K%*α+BB%*β+...+HR%*θ]

which simplifies (since lots of our out values are 0 or 1) to:

xOuts = Batters_Faced*[K%+LD%*δ+GB%*ε+OFB%*ζ+IFB%*η]

Sorted? Not quite. Not all of a pitcher's outs are made as the result of a plate appearance - there's baserunning to be considered. Here's where I make the more worrying of my two assumptions: pitchers are all equally adept at controlling the running game. This is obviously untrue, but I have no pitcher-by-pitcher data for PO/CS, and so this is the way I had to do it. So few outs are made this way compared to actually pitching that it's unlikely to skew the results very much.

Instead, I just took total CS (again from 2006) and divided them into total not-outs (there's an iterative process somewhere in here, but it's messy and I don't want to talk about it). This gives us 0.016 ( κ) CS for every not-out*. Our modified xOuts is then as follows:

xOuts' = Batters_Faced*[1-(1-(K%+LD%*δ+GB%*ε+OFB%*ζ+IFB%*η)*(1- κ)]

And then we can finally get tRA again by doing a calculation that gives us expected runs per 27 expected outs, like this:

tRA = [xRuns/xOuts']*27

This ends up being a much more robust calculation than the previous methods, because it more or less nullifies the effect defence and park**has on how many innings a pitcher works, which it was already doing for runs. It should, therefore, be significantly more accurate than previous versions of tRA.

Let's show you an example of what it all ends up looking like, using Mariner data from last year:

xIP is expected innings pitched (xOut'/3), O-xO is Outs-xOuts', and the rest of those columns should be familiar to you. Isn't it interesting how badly our defence managed to screw Jeff Weaver? At least they got outs for Ho while giving up a tonne of runs, but they completely bailed on both making plays and preventing runs for poor Weaver.

NB: The xRuns here is park adjusted and thus not what tRA is calculated with. Just so you know.

An interesting offshoot of this work is that it's generated team Outs-xOuts and xRuns-Runs. These could be useful as measures of team defence, and what's even more curious is that the relationship between O-xO and xR-R isn't nearly as strong as I was expecting. That's a story for another day, however.

*Unless you are the 2008 Mariners. >:(
**This is not entirely true. Some parks reduce the likelihood of being an out. The effect is, however, small, and I don't have easy access to those numbers anyway.

0 recs | Comment 72 comments

Read Related

Story-email Email Printer Print

Comments

Display:

Quick Q

If most of the Greek letters are simply coefficients representing the EV of any given ball in play, why did we solve for alpha and beta in that first equation and where did that equation come from?

Also, what does that alpha mean, since it seems to disappear in the xOuts calculation when it's taken to be equal to 1 for the value of a K?

by seattlebruin on Apr 9, 2008 8:48 AM PDT   0 recs

Disregard the "why" portion of my Q

I reread and answered it, but could you explain the methodology for that calculation and why you used alpha/beta? I'm feeling brain dead

by seattlebruin on Apr 9, 2008 8:49 AM PDT to parent up   0 recs

Uh, yeah, I just forgot to change the values when I copy-pasted

Should be the coefficients of OFB and IFB respectively.

The methodology is pretty simple: I had the coefficients for overall fly balls (0.79), popups (0.97), and home runs (0.00) and the relative frequency of each. What I didn't have was the coeffecient for outfield fly balls in particular.

Step by step:

1) Remove home runs to get a coefficient of (OFB+IFB): 0.79*[HR%+OFB%]/[HR%] = 0.87
2) Determine the proportion of IFB and OFB, weighted by outs: [OFB%*ζ+IFB%*η]/[OFB%+IFB%]
3) Solve for ζ with η = 0.97 and league average values taken from my spreadsheet.

Sorry about the mistake, that must have made it really difficult to puzzle out.

by Graham on Apr 9, 2008 9:04 AM PDT to parent up   0 recs

Ahhhhhhh

that makes much more sense now =)

by seattlebruin on Apr 9, 2008 9:08 AM PDT   0 recs

Yeah... I'll admit when I saw O-xO in that table

I thought of one of those Asian smiley faces my friends make and wondered "what the hell is a British guy doing using Asian AIM smileys?"

by seattlebruin on Apr 9, 2008 9:15 AM PDT to parent up   0 recs

There also seems to be a relationship between sucking

and defense.

E.G. when you suck at pitching like Ho, your teammates don't try as hard on D as they do when say, Overpaid Jarrod Washburn is pitching and they know they'll get a lot of exercise in the OF.

by seattlebruin on Apr 9, 2008 9:16 AM PDT   0 recs

Livan Hernandez proves the opposite:

O-xO: 13
xR-R: 31

And he still allowed more than 5.11 runs per nine. Mental.

by Graham on Apr 9, 2008 9:21 AM PDT to parent up   0 recs

Hmmm, I was going to say that I would expect it to be correlated to FB%

as in when the OFs think they'll actually have a chance to catch the ball they'll try harder, but it looks like Livan gave up even more screamers than even Ho and Weaver last season

by seattlebruin on Apr 9, 2008 9:26 AM PDT to parent up   0 recs

I wouldn't be surprised if there was a correlation though

cause the really shitty pitchers are going to give up "less fieldable" balls than a good pitcher. Maybe this is just bullshit I'm making up (very possible) but for non-MLB quality pitchers this seems like a good hypothesis.

by Edgar for Pres on Apr 9, 2008 9:52 AM PDT to parent up   0 recs

Less fieldable balls = more line drives

This should account for that effect

by Graham on Apr 9, 2008 9:57 AM PDT to parent up   0 recs

Yeah line drives probalby take care of most of it

But deep fly balls and hard hit ground balls also would lead to more runs. If you are a really crappy pitcher, you'd expect to see less softly hit ground balls and soft and easy to field pop flies.

by Edgar for Pres on Apr 9, 2008 3:35 PM PDT to parent up   0 recs

I was kinda guessing that had been looked at

I still think its a little amazing that we can classify all hits into such broad groups.

by Edgar for Pres on Apr 9, 2008 6:22 PM PDT to parent up   0 recs

Hmmm, it might be interesting to do a quick analysis

between tRA and xR-R, although you'd have to normalize xR-R somehow (perhaps xR-R/xO?)

I wouldn't be surprised if Hernandez is the outlier more than the rule, although he somehow has gained a reputation as a decent pitcher despite his LD rate

by seattlebruin on Apr 9, 2008 9:57 AM PDT to parent up   0 recs

Don't understand this:

" Hmmm, it might be interesting to do a quick analysis

between tRA and xR-R, although you'd have to normalize xR-R somehow (perhaps xR-R/xO?)"

Please explain?

by Graham on Apr 9, 2008 10:00 AM PDT to parent up   0 recs

I think what I'm trying to say is that I think Hernandez might be the outlier in terms of

defensive support in relation to pitching ability.

In that case, it might be interesting to see someone quickly plot and see if tRA and xR-R has any correlation over say, a season (2006 or whatever, since I'm assuming based on your stated assumption that's where all this data came from). The only problem I would see is that xR-R is an absolute value rather than one relative to the amount pitched, so it would make sense that you would see a lot of variation in pitchers who had only pitched 100 innings instead of ~200. In that case, I would want to make xR-R a ratio by dividing by the total number of outs recorded (e.g. figure out how much the D helped/hurt the pitcher in relation to amount pitched) and then see if that has any correlation with the expected runs against.

Did that make any sense at all?

by seattlebruin on Apr 9, 2008 10:05 AM PDT to parent up   0 recs

Hmm there's also a chance I'm utterly retarded

because you would expect a much stronger relationship between D and xO-O rather than xR-R

by seattlebruin on Apr 9, 2008 10:07 AM PDT to parent up   0 recs

Initial hypothesis:

xO-O ~ infield
R-xR ~ outfield

Won't be as simple as that, but enh.

by Graham on Apr 9, 2008 10:15 AM PDT to parent up   0 recs

I haven't done a proper analysis

And you're right, I probably should, but I have to say that I haven't noticed any trends one way or the other, and I've spent a lot of time looking at this, and haven't noticed any correlation one way or the other. I'll get around to a proper analysis at some point though

by Graham on Apr 9, 2008 10:14 AM PDT to parent up   0 recs

Because I am awesome...

There's basically correlation between runs over average and strength of defence. Correlation between O-xO and xR-R is pretty damn strong though, more than I thought from just going through the numbers.

by Graham on Apr 9, 2008 2:14 PM PDT to parent up   0 recs

Yeah, it does

A cursory glance through the data didn't show that, though, so I thought I'd stick it in a graph to make sure.

by Graham on Apr 9, 2008 2:34 PM PDT to parent up   0 recs

Yes, Graham, you're awesome

keep telling yourself that and maybe someday it will be true!

Thanks for the data though - I guess I was way out on a limb to suspect that, although it seemed interesting that the two pitchers who received the least support from our D happened to be two of the worst starters of all time.

by seattlebruin on Apr 9, 2008 3:15 PM PDT to parent up   0 recs

It's generally pretty small

Although for some reason there's a large effect on the AL relievers (drops avg tRA by like 0.07).

by Graham on Apr 9, 2008 9:59 AM PDT to parent up   0 recs

Good Stuff as usual....

However, I think to test this you need to run this for every pitcher on every team and determine what the aggregate "O-xO" turns out to be. This value should land somewhere in between 5 and -5 I would imagine. If it doesn't then there probably needs to be some tweaks in the formula.

by PLU Tim on Apr 9, 2008 10:03 AM PDT   0 recs

It's off by 15 over every team for an entire year

There were about about 130000 outs last year, 15 is an error of slightly more than 0.01%

by Graham on Apr 9, 2008 10:08 AM PDT to parent up   0 recs

Always negative?

If so, you can use that I think. You can find the mean of the differences and then find out how much each team deviated from that mean. I'd be interested to see what order the teams would land to see if it is in-line with other team defensive ratings.

by PLU Tim on Apr 9, 2008 10:13 AM PDT to parent up   0 recs

I may have phrased that wrong

It's not off by 15 per team, it's off by +15 summed over the whole of MLB, meaning we can say the average fielding team is at 0 outs.

by Graham on Apr 9, 2008 10:16 AM PDT to parent up   0 recs

thinking about this....

I guess I am making the assumption that there is roughly an equal amount of positive defensive contributions as there are negative. If the trend in MLB is to value a bat over a glove then I suppose the aggregate O-xO would be considarably negative.

by PLU Tim on Apr 9, 2008 10:10 AM PDT to parent up   0 recs

Well, I base it around MLB average so you're right, it should sum close to zero

But what you just mentioned does seem to happen in the AL - there's a pretty big gap between the AL xO-O and the NL's: it's about 300, which is crazy. It's about the same for R-xR

by Graham on Apr 9, 2008 10:12 AM PDT to parent up   0 recs

Holy Crap....

learn to space deliminate.

by PLU Tim on Apr 9, 2008 10:17 AM PDT to parent up   0 recs

Enlighten us

I know there's a way, but don't know how to do it

by seattlebruin on Apr 9, 2008 10:18 AM PDT to parent up   0 recs

I know how to import data

Fangraphs is just set up to make automating anything nearly impossible.

by Graham on Apr 9, 2008 10:22 AM PDT to parent up   0 recs

Wow. I was going to include some snarky comment about how

I hoped you didn't actually do that for the entire league but...

by seattlebruin on Apr 9, 2008 10:17 AM PDT to parent up   0 recs

So....

What you're saying is that the aggregate O-xO in the AL was something like -142 but in the NL it was +157 (example figures to be roughtly 300 difference at +15).

I wonder if this can largely be attributed to sacrifices and weakly hit balls by the pitchers in the NL.

by PLU Tim on Apr 9, 2008 10:33 AM PDT to parent up   0 recs

Potentially

but you would expect pitchers to hit a lot of ground balls and DHs to hit more LDs (they ARE professional hitters after all), which would reduce the number of expected outs in the AL and increase it in in the NL, no?

Seems to me like this would render the league differences between xO-O moot, but obviously there must be an explanation for it. I would have to assume that the NL simply values fielding more heavily than the NL unless there is some huge number of bunts somewhere to explain it all since a bunt is a grounder with a much higher chance of becoming an out than your run of the mill grounder

by seattlebruin on Apr 9, 2008 10:38 AM PDT to parent up   0 recs

NL values fielding more than the AL

the NL does not value fielding more than itself.

by seattlebruin on Apr 9, 2008 10:39 AM PDT to parent up   0 recs

I have considered this

I thought that the R-xR difference between AL and NL could be explained by different run environments yielding different run values on plays.

The same is possible for xO-x, certainly. It's something I really want to get around to investigate.

by Graham on Apr 9, 2008 10:40 AM PDT to parent up   0 recs

Hmmm, I think this seems more likely

especially that the value of a grounder would be significant (statistically) lower in the NL than the AL

by seattlebruin on Apr 9, 2008 10:42 AM PDT to parent up   0 recs

*seems like it would be

me + coherent thought today = fail

by seattlebruin on Apr 9, 2008 10:42 AM PDT to parent up   0 recs

Of course,

actually deriving all this info is something I'm not good enough at Retrosheet to manage sooo

by Graham on Apr 9, 2008 10:44 AM PDT to parent up   0 recs

Sacrifice Hits....

Per BB-REF...

The NL last year had 1045 Sac Hits.
The AL last year had 495 Sac Hits

I assume that this stat excludes Sac Flies because in the AL, Sac Flies is actually a higher number and Sac Hits.

You can go 2 ways with this...

1. Figure out the % of Sac Hits (all of which I assume are groundballs) that are attributed to bunts

or

2. Assume all of the Sac Hits are "easy plays" and assume don't differentiate between the bunt and a typical groundball. The number of Sac "grounders" in the AL and NL should be similar I would assume. Thus, the difference would almost entirely be attributed to the bunt.

So, for simplicity sake, I'd go with #2.

Then:

1. Remove these totals from the "groundballs" and recalculate the xO of a groundball.
2. Use the Sac Hits total and assign an xO value of 1 to it.
3. Recalc everything and see if anything cool happens....

Just my .02

by PLU Tim on Apr 9, 2008 11:32 AM PDT to parent up   0 recs

Hmmm

I think I'm going to look at the bigger picture and just try to figure out the difference in run/out values of those seven stats between leagues. I expect that the biggest change will be in GB because of the bunting thing though, yep

by Graham on Apr 9, 2008 11:41 AM PDT to parent up   0 recs

Small but important point

As most of you know, I'm inordinately fond of adjusting a pitcher's runs given up by the outcomes a pitcher himself controls, namely K/BB/HBP/GB/OFB/IFB/HR.

Those are the outcomes that the pitcher and the hitter control. Over the course of a full season that's not a big deal, but over the course of a month or two, I'm less convinced that the effects of opposing lineups wash out entirely.

Your general procedure for determining the appropriate number of IP seems right to me, though.

by ubelmann on Apr 9, 2008 11:26 AM PDT   0 recs

Yep, you're right

Ideally you control for opponent, but using that data would add a whole new level of complexity to the stat, and I'm not sure it'd be of much benefit over the whole season.

If and when I start seriously deploying this on the shorter term, information about the quality of batters will certainly have to be added.

by Graham on Apr 9, 2008 11:30 AM PDT to parent up   0 recs

You might be able to get away with a weighted average of current season and Marcel's

And sort of reverse PrOPSing their batted ball profile?

I'll burn this bridge when I come to it, anyway.

by Graham on Apr 9, 2008 11:42 AM PDT to parent up   0 recs

what about for rookies or part time players though?

I say screw it. It's not going to be worth it unless somebody ponies up $$$

by Matthew on Apr 9, 2008 11:46 AM PDT to parent up   0 recs

Another problem...

...is figuring out what the appropriate baseline is to adjust to. Bedard and Felix are going to consistently see different lineups over the course of the season. Last year in the AL, RHB had a .743 OPS overall and LHB had a .774 OPS overall. It seems like there could be some inherent platoon advantage to being a LHP (and subsequently getting those LHB--who tend to have bigger platoon splits anyway--out of the lineup) and you should adjust "league average hitting" differently for RH starters and LH starters, since they seemingly will face different sets of hitters. (This, of course, is probably a small effect, like everything else mentioned in this subthread.)

by ubelmann on Apr 9, 2008 2:40 PM PDT to parent up   0 recs

Comments For This Post Are Closed