Open Statistical Question Thread
Every few days or so I get an email from a reader asking about some of the numbers I (and others) use on the website. They usually go something like "hey Jeff, I was wondering about one of the statistics you used the other day, but I didn't want to ask about it and look stupid. What is the meaning of (statistical acronym)?"
I'm fine with getting those emails, and I try to reply to them as best I can. But since the common theme seems to be that people don't want to ask questions here either because (A) they don't know where to ask, or (B) they don't want to be made fun of, I thought I'd take a measure to rectify the situation by opening this thread. In the comments below, you're welcome to ask questions on any statistic about which you're unclear - no matter how simple - and we'll see to it that you get a good answer. Don't get FIP? Not sure what counts as a good groundball percentage? Curious about the league-average BA? This is your thread. While I don't know how popular it's going to be (if at all), my hope is that this'll help get people caught up while eventually serving as a kind of partial FAQ for people new to analysis.
Go to town. (And please, let's keep this thread on topic.)
4 recs |
435
comments
Read Related
Comments
Completely subjective
but what is the best statistical method(s) for evaluating pitchers? I’d love to be able to finally explain to my buddy why ERA is a lousy metric without looking like a total asshat.
by BrianL on Apr 21, 2008 10:15 PM PDT 0 recs
Currently I'd have to go with tRA
But I’m bias because Graham is awesome.
Yesterday's Pants
A blog-thingy about the Mariners and stuff.
by BrettJMiller on
Apr 21, 2008 10:17 PM PDT
up
0 recs
I'm partial to Graham's tRA, but anything from K/BB to xFIP provide way more information than ERA
This is an excellent primer.
by Jeff on
Apr 21, 2008 10:20 PM PDT
up
0 recs
tRA I believe, but FIP is easier to find
Main points to hammer home is that you want to eliminate any possible interaction from the defense when you evaluate a pitcher. And fielding batted balls is 95%+ the burden of the fielders and has nothing to do with the pitcher.
by Matthew on
Apr 21, 2008 10:21 PM PDT
up
0 recs
Excellent.
I understand and have tried to explain to my friend that ERA is extremely dependent on the fielders behind the pitcher and other factors outside of his control. This should help me illustrate my point more.
by BrianL on
Apr 21, 2008 10:23 PM PDT
up
0 recs
The hard part is convincing people that batted balls are not the responsibility of the pitcher
It’s a very hard concept to grasp and without accepting it, it’s tough to get people to see past ERA.
Also worth mentioning, errors (and consequently, earned runs) are horrible horrible numbers because they are at the sole discretion of the official scorer who is accountable to pretty much nobody and also will never charge an error when, say, a ball drops untouched by an outfielder who misplays it.
by Matthew on
Apr 21, 2008 10:33 PM PDT
up
0 recs
[An official scorer] will never charge an error when, say, a ball drops untouched by an outfielder who misplays it.
Funny enough I believe this can now be legally scored as an error, I have yet to see it though.
Barry Bonds died for your sins.
by JI on
Apr 21, 2008 10:39 PM PDT
up
0 recs
I hate that my seats are in RF sometimes. I got a fifth-row seat to that disaster.
Yesterday's Pants
A blog-thingy about the Mariners and stuff.
by BrettJMiller on
Apr 21, 2008 10:41 PM PDT
up
0 recs
Batted balls ARE partially the responsibility of the pitcher.
Pitchers DO have control over their BABIP. It’s just that the spread in talent is small enough that random variation often overshadows the talent. That’s why BABIP is regressed so heavily to the mean. Other skills with wider variation, such as K% or BB%, aren’t regressed so heavily.
FIP is a shortcut wherein you regress BABIP 100% of the way to the mean and you regress K%, BB%, and HR% 0% of the way to the mean. That’s something that people should understand.
stat-addled alien overlord
by salb918 on
Apr 22, 2008 6:02 AM PDT
up
0 recs
I hope we as a community
Aren’t intimidating people into not asking these questions….I always thought we were pretty willing to help people understand these things without judging them for it….Let’s be honest, we were all ignorant of this statistical rationale at some point in our lives as baseball fans and we shouldn’t be making fun of people for trying to expand their knowledge of the sport.
by OlSalty on Apr 21, 2008 10:18 PM PDT 0 recs
thats why I started coming here back in 2006. I could ask the most retarded question and get a good answer.
I still don’t get it all but I sure get it better than i did.
by InSpokane on
Apr 22, 2008 10:04 AM PDT
up
0 recs
Value Over Replacement Player
Basically when you determine the league average for statistics, it is how a player ranks relative to the middle of the pack.
by OlSalty on
Apr 21, 2008 10:22 PM PDT
up
0 recs
i understand that...
..but what i’m wondering is what comes into play here? vidro ranks high above the league average DH in terms of AVG but sucks in terms of OPS/SLG. each player has a VORP stat. what comes into play for that stat? is it purely offensively based? is defense included? does the calculation of it differ between positions? is there a VORP for both pitchers and non-pitchers?
br
br
by sirbrianwilson on
Apr 21, 2008 10:25 PM PDT
up
0 recs
VORP does not consider defense abilities, but does factor in position
Not sure on park. However, the formula itself is proprietary so there’s not much we can say about it. There’s also many issues lingering around the use of “replacement level” in any statistic. Comparing relative to league average is almost always better.
by Matthew on
Apr 21, 2008 10:29 PM PDT
up
0 recs
so...from both yours and jeff's comments...
...i take away the point that VORP is useless. if it doesn’t factor in defensive stats, then everything is relative. this is the type of stat that i could see putting ibanez in the limelight…
also…jeff mentioned that the formula for calculating “replacement level” is “secret.” wtf?
br
br
by sirbrianwilson on
Apr 21, 2008 10:32 PM PDT
up
0 recs
it's not useless
There are just better measurements now. VORP was very important when it came out.
by Matthew on
Apr 21, 2008 10:34 PM PDT
up
0 recs
maybe it's not useless..
...but, as a weathered statistician, i must bring into question when a formula only accounts for half of the equation (just offense, not defense). and what is the calculation for “replace level.”
ha…one thing i love about this new SBN format is that i can see new replies as i’m typing this one. Jeff just commented how it’s not worth the argument since VORP is an old stat. I can understand that.
My new question: What 3 stat categories are the best for evaluating pitching (because i’m sure there isn’t some grand-stat that explains all) and what 3 are most important for all other positions (DH not included).
I’m mainly looking for something, in terms of position players, that displays both offensive and defensive quality/worth…
br
br
by sirbrianwilson on
Apr 21, 2008 10:37 PM PDT
up
0 recs
*edit
should have written, “I must bring a formula into question when it…”
most apologies.
br
br
by sirbrianwilson on
Apr 21, 2008 10:46 PM PDT
up
0 recs
Err it deliberately
omits defense because it is meant only to be used as an offensive stat.
Would you say OPS / OPS+/ EQA / Batting Runs / Batting Runs above Average / BaseRuns are all “useless”? They all also omit half of the equation.
ZIPS: Milledge: 466 HR, 485 2B, 2282 hits, 278-379-524
by rfloh on
Apr 22, 2008 7:11 AM PDT
up
0 recs
Is there anything BP does that's still relevant?
Barry Bonds died for your sins.
by JI on
Apr 21, 2008 10:40 PM PDT
up
0 recs
EqA is pretty good
other than that? I don’t think so. I’m also biased. (Disclosure: I work for THT)
by Matthew on
Apr 21, 2008 10:43 PM PDT
up
0 recs
eh, I suppose
PECOTA is no longer any more accurate than most other good prediction systems out there.
by Matthew on
Apr 21, 2008 10:59 PM PDT
up
0 recs
I second this
since wOBA is pretty hard to find, EqA is a good one-number offensive statistic.
by Jeff on
Apr 21, 2008 10:49 PM PDT
up
0 recs
GPA
GPA is just about as good as EqA or WOBA, and is readily available on the THT site (and it’s adjusted for ballpark). I personally like it better than wOBA because, like EqA, is fits the scale of a batting average, which is more intuitive for most fans than the scale of OBP.
by studes on
Apr 22, 2008 7:14 AM PDT
up
0 recs
The big difference
How do you convert GPA to runs? I have no idea. Does anyone?
How do you convert wOBA to runs? Divide by 1.15, then multiply by expected plate appearances Easy – I can do that almost in my head, and with a calculator or spreadsheet, it takes three seconds.
Those of us who want THT to display wOBA aren’t just agitating because we want something that is 0.4% more accurate – there’s a purpose to having wOBA published – we know how to turn it into a linear weights run value method, and that’s something that is worth having available to the statistical community. (And yes, I know I can get it from firstinning – thanks Chris! You’re a lifesaver)
by davidcameron on
Apr 22, 2008 7:39 AM PDT
up
0 recs
simple
Our glossary tells you how to convert GPA to runs.
by studes on
Apr 22, 2008 7:59 AM PDT
up
0 recs
You work for THT...
so tell them to publish wOBA, and make a good park adjusted version. I use OPS+ as my go-to single statistic, which is silly because it kinda sucks, but Bpro has a very unwieldy site. It is a travesty that no one publishes wOBA.
The A's colors are green and gold.
by mikeA on
Apr 21, 2008 11:38 PM PDT
up
0 recs
no one?
FirstInning.com has published wOBA for a while. Not sure if it’s park-adjusted, though
by Bdo on
Apr 22, 2008 5:43 AM PDT
up
0 recs
Thanks,
hadn’t been going there. Great pitching stats too.
The A's colors are green and gold.
by mikeA on
Apr 22, 2008 3:52 PM PDT
up
0 recs
Different people have different "replacement level" baselines
I haven’t used VORP in years. At this point explaining its quirks is probably more trouble than it’s worth.
by Jeff on
Apr 21, 2008 10:34 PM PDT
up
0 recs
I think VORP, when you get down to its core,
is essentially based in linear weights. I could be wrong about that, though.
stat-addled alien overlord
by salb918 on
Apr 22, 2008 6:05 AM PDT
up
0 recs
Batting Average is useless
OBP and SLG are the only two stats that matter in determining a players offensive production. OPS (combination of OBP and SLG) is all that matters because it determines how often you get on base and how many bases you produce per hit. BA just determines how often you put a ball in play that puts you on the basepaths, and that discounts both extra=base hits and walks.
And yeah, for a specific VORP per position metric, ask Jeff or Matthew or Graham, they know this stuff :)
by OlSalty on
Apr 21, 2008 10:30 PM PDT
up
0 recs
batting average isn't useless
just vastly overrated
Barry Bonds died for your sins.
by JI on
Apr 21, 2008 10:41 PM PDT
up
0 recs
It's pretty useless with all the other superior stats readily available these days
by OlSalty on
Apr 21, 2008 10:42 PM PDT
up
0 recs
yes
but you can make a pretty good argument that Ichiro and Rod Carew were more productive than than you typical .370/.430 guy.
Barry Bonds died for your sins.
by JI on
Apr 21, 2008 10:44 PM PDT
up
0 recs
true
but .320/.350/.450 is more valuable than .250/.350/.450 both in terms of run production and also in light of the excellent old versus new player skills article Derek posted at USSM today.
by Matthew on
Apr 21, 2008 10:44 PM PDT
up
0 recs
OPS favors sluggers without OBP skills too much.
.300/.400/.450 is an .850 OPS
.300/.350/.500 is an .850 OPS
The one with the .400 OBP is more valuable.
OBP, SLG, and OPS aren’t the ONLY stats that matter, but they’re closer than a lot of stats people tend to use traditionally
Yesterday's Pants
A blog-thingy about the Mariners and stuff.
by BrettJMiller on
Apr 21, 2008 10:43 PM PDT
up
0 recs
And it's park factors are too simle
Exhibit A: Safeco Field
There’s no reason to judge Johjima and Ibanez using identical park factors.
Barry Bonds died for your sins.
by JI on
Apr 21, 2008 10:45 PM PDT
up
0 recs
Exactly, people forget sometimes that in park factors not all players are effected the same way.
Kenji and Raul both play in a pitcher’s park. Safeco actually helps lefties like Raul despite Safeco overall favoring pitchers… It’s a pretty obvious example but it’s a good one.
Yesterday's Pants
A blog-thingy about the Mariners and stuff.
by BrettJMiller on
Apr 21, 2008 10:47 PM PDT
up
0 recs
Actually there's a very good case for this
Park factors are important for two things: measuring how valuable an individual run produced is, due to run environment, and trying to understand the true skill of a player.
Second requires different park factors by hand, etc, but the first just needs a blanket run factor.
by Graham on
Apr 21, 2008 10:55 PM PDT
up
0 recs
VORP is an offensive metric that assigns players a certain run value over that which you could expect from a “replacement-level player” (basically the kind of guy you can find hanging around in AAA). It’s weighted for position, so if you have a 1B and a SS who post identical batting lines, the SS will have a higher VORP, because a replacement-level shortstop will be worse at the plate than a replacement-level first baseman.
It’s a complicated, secret formula that…I’ll be honest, not too many people use it anymore. It’s almost completely useless for pitchers, and for hitters, you’re better off looking at simpler, more intuitive stuff like BA/OBP/SLG and line drive rate and all that good stuff.
by Jeff on
Apr 21, 2008 10:28 PM PDT
up
0 recs
Edit:
VORP is an offensive metric for hitters and, obviously, a pitching metric for pitchers. But you should never look at VORP for pitchers anyway.
by Jeff on
Apr 21, 2008 10:29 PM PDT
up
0 recs
It's essentially the following formula
(Player runs created per plate appearance)-(position replacement level RC/PA)*PA
Replacement level is defined by a certain % of league average, depending on the position. VORP is obviously more complex than this, but the above is a pretty good approximation.
by Graham on
Apr 21, 2008 10:58 PM PDT
up
0 recs
Also
I’m not very clear on the popular fielding metrics being used. I hear things like RZR being tossed around, but I’ve got no idea what that means and how it’s used to evaluate a defender.
In the first diary I opened up here, I asked for some information on former and current Mariner fielders. I think Jeff tossed out that Lopez was a 0 < X < 10 defender.
I have absolutely no idea what that means.
by BrianL on Apr 21, 2008 10:22 PM PDT 0 recs
RZR = revised zone rating
available at THT. You’ll also see UZR (ultimate zone rating, developed by MGL) and PMR (probabilistic model of range developed by David Pinto). They all try to measure the same thing in roughly the same way, none of them are that robust in figuring out an individual fielder’s actual value.
There’s also SAFE (spatial aggregate fielding evaluation developed by a team of Penn Statistics researchers [disclaimer: I was one of them]) that does a better job than any of the above three, but requires data from BIS and thus it’s only available for 2003-6.
As far as fielding evaluation, we’re pretty good at figuring out how a team as a whole ranks, and can do a decent job at differentiating between infield defense and outfield defense as unit, but getting to individual players requires at least 3 years of rankings from all the above to have any idea.
0 < x < 10 means that Jeff believes Jose Lopez’s defense is worth between 0 and +10 runs above average for a 2B.
by Matthew on
Apr 21, 2008 10:42 PM PDT
up
0 recs
If HR/FB% is out of a pitcher's control,
what’s the point of ever looking at FIP when xFIP exists?
by naviomelo on Apr 21, 2008 10:22 PM PDT 0 recs
FIP is better for relievers, for whom the HR/FB% "rule" doesn't really apply
and a lot of people still aren’t entirely convinced that pitchers fluctuate around an HR/FB of 11% anyway, so they have trouble stomaching xFIP. FIP isn’t bad, as long as you keep an eye on the home run rate to make sure it isn’t wonky/unsustainable.
by Jeff on
Apr 21, 2008 10:24 PM PDT
up
0 recs
Addendum
while HR/FB% is at least mostly out of a pitcher’s control, it is still heavily influenced by the park so while xFIP might be better for figuring out how a pitcher might produce in a neutral environment, FIP might be better for evaluating a pitcher going forward on said team.
by Matthew on
Apr 21, 2008 10:28 PM PDT
up
0 recs
home park is included
xFIP is adjusted to the pitcher’s home park.
by studes on
Apr 22, 2008 7:16 AM PDT
up
0 recs
right of course.
Why do I always forget that?
Any thoughts on modifying xFIP to regress HR/FB (adjusted for home park) say 80% towards league average instead of 100% since we know there is a little bit of pitcher control over it?
by Matthew on
Apr 22, 2008 9:33 AM PDT
up
0 recs
Well, not really
To me, there has to come a point where you don’t turn a stat into a projection system. Even though I invented it, I’m not the biggest fan of xFIP, for just that reason. I like either a simple stat, or a full-blown projection system.
by studes on
Apr 22, 2008 9:43 AM PDT
up
0 recs
:P back atcha
...which goes against most of the basic tenets of good information display!
by studes on
Apr 22, 2008 10:33 AM PDT
up
0 recs
Modern user-friendliness appears to be based on a single study
done by Apple in 1986.
by Llewdor on
Apr 22, 2008 11:38 AM PDT
up
0 recs
Because it is in dispute just how much
“out of a pitcher’s control it is.”
MGL and Tango in response to Dave Cameron’s article on pitching components:
The author also says that there is little (or was it "no”) evidence that a pitcher has any control over the percentage of fly balls that are home runs. That is completely false. Pitchers very much have control over the distances of their fly balls and hence the percentage of them that go for home runs. This is easy to verify in a numbers of ways of course. I don’t know where this idea came from that pitchers have little control over HR/HR, but it seems to be "going around" in some ssabermetric circles.
I think there is a misconception that pitchers have "little or no" control over those HR/FB and that a pitcher’s true HR rate is almost compeltely a function of his FB rate. That is simply not true. In fact, in the research I have done, there is an indication that a pitcher’s true HR rate is very much independent of his FB rate. Or at least as important.
Here is some data on pitchers in 04 and 05. I ran a regression of HR/FB in 04 on HR/FB in 05 for pitchers with at least 500 TBF per year. The average TBF per year was 772 and the average FB was 209. A FB was any air ball, not including line drives (according to STATS)."r" was .232.
If I only use outfield fly balls, which is defined as all air balls, but not line drives (again, according to STATS), to only the OF locations (an average of 176 OF flies in the 772 TBF), I get an "r" of .222.
The number of pitchers in the sample is 88.
I I decrease the min TBF to only 200 (an average of 530 TBF), I get an "r" of .081 for 229 pitchers.
That sounds like what THT got. We need a larger sample to decrease the uncertainty of these "r’s".
If we increase the sample to include 98 on 99, 00 on 01, and 02 on 03, we get 890 pitchers with at least 200 TBF per year with an "r" of .181 (rather than the .08 with a smaller sample).
For pitchers with TBF greater than 499, we have 317 pitchers with and "r" of .190 (rather than the previous .232 with the smaller sample.
Nothing is park adjusted.
I re-ran the correlations for players who switched teams from one year to the other, in order to make sure that the correlation was not being significantly influenced by the park HR factor.For 94-05 (regressing 94 on 95, 96 on 97, etc.) data, there were 85 pitchers who had at least 500 TBF in each of two consecutive years and switched teams. The "r" was .203, so the suggestion is that the "r’s" that we are getting are NOT due to the parks only or even mainly.
In MGL’s first study, he did FB=209, r=.232This gives us an "x" for the equation of x / (x+FB) of 692.
In this case, 692 FB corresponds to about 2500 TBF.
In his next study, he did FB=176, r=.222 . The "x" value is 617. TBF corresponds to about 2700.
In his next study, he did r=.081, with an unknown number of FB, but which I will guess is FB=120. The "x" value is 1361.
Now, there is a danger in how you do a correlation, if you don’t weight each sample appropriately. This is why with a straight regression, you want your samples to have a similar number of "n", amongst themselves, and in the paired sample. There is a better way to do it otherwise, but more complicated.
Looking at the first two studies, we see that r=.50, when TBF is around 2500 or so, meaning about 580 IP.
In may last sample of players who switched teams, the average TBF was 746 and the average FB was 197. "R" was .205 for 85 pitchers.In the larger sample of all pitchers, the "r" was .190 and the sample size was 407, the average TBF was 760 and the average fb was 196.
I would use the "r" and the other data from the second sample, since it is much larger and the uncertainty from the "r" is much smaller.
There’s also some other stuff, specifically #30, where he shows the relationship between GB% and HR / FB% is weak.
ZIPS: Milledge: 466 HR, 485 2B, 2282 hits, 278-379-524
by rfloh on
Apr 22, 2008 7:33 AM PDT
up
1 recs
Yeah, I have single season r at 0.21 or so for HR/(Ball in air)
But that’s by far the weakest year to year relationship for any of the 7 major measurables (even lower than HBP%!)
by Graham on
Apr 22, 2008 7:38 AM PDT
up
0 recs
covered
This was covered in both the ‘06 and ‘07 THT Annuals. Both the binomial and year-to-year correlations were between .1 and .2. Pretty weak.
by studes on
Apr 22, 2008 8:03 AM PDT
up
0 recs
What are the best metrics for hitters? Pitchers?
WPA? VORP?
FIP?
We don't negotiate with terrorists.
by Mariner John on Apr 21, 2008 10:28 PM PDT 0 recs
WPA shows how much they contributed regarding situation...
Just off hand I’d assume that WPA isn’t very useful for evaluating a player’s skill. It’s all about results in certain situations and results-based analysis is not really a good way to evaluate a player.
Yesterday's Pants
A blog-thingy about the Mariners and stuff.
by BrettJMiller on
Apr 21, 2008 10:29 PM PDT
up
0 recs
I know that WPA isn't a good predictor FWIW.
We don't negotiate with terrorists.
by Mariner John on
Apr 21, 2008 10:30 PM PDT
up
0 recs
For pitchers, look at tRA (ask Graham), FIP, xFIP, or simpler stuff like strikeout/walk/GB rates
For hitters, the best stuff is probably tango’s linear weights, but since that method is complicated and not really available anywhere, you’re okay just looking at the raw batting line, adjusting for park, and checking to see that the BABIP is sustainable.
by Jeff on
Apr 21, 2008 10:33 PM PDT
up
0 recs
Batting Runs on BBRef
is a linear weights measure, park adjusted, but without SBs and CS,
For many players, exceptions being guys like Ichiro, Reyes etc, it should work fine.
ZIPS: Milledge: 466 HR, 485 2B, 2282 hits, 278-379-524
by rfloh on
Apr 22, 2008 7:35 AM PDT
up
0 recs

