clock menu more-arrow no yes mobile

Filed under:

Rough Confidence Intervals for Handed Park Factors

Note: if you do not care about statistical details, just ignore this post (again).

UPDATE: I gave this some more thought and I have what seems like the more logical model now.

I'm not fully boned up on my statistical models, I admit. I'm a programmer now and what I know is coding and databases. Still, some people expressed some interest in having a margin of error on the park factors post that I presented earlier. I'm unsure on the exactly correct method that would apply here but below is my best guess.

All of these factors are based on discrete possibilities, like rolling a die. You cannot roll a die and have it land on both 3 and 6. Likewise, an at bat cannot end with both a strikeout and a fly ball. Binomial distributions are used to model such circumstances.

To use some concrete data, Safeco saw 2,665 ground balls hits by left-handed hitters out of 7,648 possible chances. That gives us a ~35% chance of a ground ball. In statistical shorthand, 35% would be considered p, the percentage of the outcome occurring. The inverse of that, the percentage of the outcome not occurring, would be called q. The number of trials, 7,648, would be called n.

The standard deviation of a binomial percentage is given by the square root of ( p*q / n ). It takes just under two (1.96 to be precise) standard deviations to capture 95% of the data, so the standard deviation multiplied by 1.96 to get the plus/minus from the average. In this case, that turns out to be ~1% so the 95% confidence interval for the odds that a left-handed batter would hit a ground ball in Safeco is 35% +/- 1% or (34%, 36%). 

Repeating the same away from Safeco yields a 95% interval of (35%, 37%). The factors are a ratio of the ratios however, so I had to find out how much the maximum possible spread would alter that. So I took 34% [rate in Safeco minus the error] divided by 37% [rate away from Safeco plus the error] and 36% [rate in Safeco plus the error] divided by 35% [rate away from Safeco minus the error].  Half the difference between those two ratios (turns out to be 6 points) gives me the +/- from the factor itself.

Therefore, the park factor for left-handed ground balls in Safeco Field would more accurately be written as 97 +/- 6 or (91, 103). With 95% certainty, I can say that Safeco Field has a ground ball factor somewhere between 91 and 103. 

Please note that these are valid only for stadia in full use over the sample, that is from 2007 onward. From a Seattle perspective, that's fine, but if we ever start talking about New Yankee Stadium, the confidence interval is obviously much wider.

Factor, 95% C.I. for LH, RH
For strikeouts, +/- 11, 10
For walks, +/- 15, 14
For hit by pitch, +/- 49, 43
For groundballs, +/- 6, 5
For flyballs, +/- 9, 7
For line drives, +/- 11, 10
For infield flies, +/- 21, 15
For singles, +/- 10, 8
For doubles, +/- 17, 17
For triples, +/- 48, 71
For home runs, +/- 27, 21

Here's a link back to the Safeco thread, and one to the Tacoma thread as well.