Lookout Landing: An SB Nation Community

Navigation: Jump to content areas:





Open Statistical Question Thread

Every few days or so I get an email from a reader asking about some of the numbers I (and others) use on the website. They usually go something like "hey Jeff, I was wondering about one of the statistics you used the other day, but I didn't want to ask about it and look stupid. What is the meaning of (statistical acronym)?"

I'm fine with getting those emails, and I try to reply to them as best I can. But since the common theme seems to be that people don't want to ask questions here either because (A) they don't know where to ask, or (B) they don't want to be made fun of, I thought I'd take a measure to rectify the situation by opening this thread. In the comments below, you're welcome to ask questions on any statistic about which you're unclear - no matter how simple - and we'll see to it that you get a good answer. Don't get FIP? Not sure what counts as a good groundball percentage? Curious about the league-average BA? This is your thread. While I don't know how popular it's going to be (if at all), my hope is that this'll help get people caught up while eventually serving as a kind of partial FAQ for people new to analysis.

Go to town. (And please, let's keep this thread on topic.)

4 recs | Comment 435 comments

Read Related

Story-email Email | Print |

Comments

Display:

Completely subjective

but what is the best statistical method(s) for evaluating pitchers? I’d love to be able to finally explain to my buddy why ERA is a lousy metric without looking like a total asshat.

by BrianL on Apr 21, 2008 10:15 PM PDT   0 recs

Currently I'd have to go with tRA

But I’m bias because Graham is awesome.

Yesterday's Pants
A blog-thingy about the Mariners and stuff.

by BrettJMiller on Apr 21, 2008 10:17 PM PDT to parent up   0 recs

tRA I believe, but FIP is easier to find

Main points to hammer home is that you want to eliminate any possible interaction from the defense when you evaluate a pitcher. And fielding batted balls is 95%+ the burden of the fielders and has nothing to do with the pitcher.

by Matthew on Apr 21, 2008 10:21 PM PDT to parent up   0 recs

Excellent.

I understand and have tried to explain to my friend that ERA is extremely dependent on the fielders behind the pitcher and other factors outside of his control. This should help me illustrate my point more.

by BrianL on Apr 21, 2008 10:23 PM PDT to parent up   0 recs

The hard part is convincing people that batted balls are not the responsibility of the pitcher

It’s a very hard concept to grasp and without accepting it, it’s tough to get people to see past ERA.

Also worth mentioning, errors (and consequently, earned runs) are horrible horrible numbers because they are at the sole discretion of the official scorer who is accountable to pretty much nobody and also will never charge an error when, say, a ball drops untouched by an outfielder who misplays it.

by Matthew on Apr 21, 2008 10:33 PM PDT to parent up   0 recs

[An official scorer] will never charge an error when, say, a ball drops untouched by an outfielder who misplays it.

Funny enough I believe this can now be legally scored as an error, I have yet to see it though.

Barry Bonds died for your sins.

by JI on Apr 21, 2008 10:39 PM PDT to parent up   0 recs

Batted balls ARE partially the responsibility of the pitcher.

Pitchers DO have control over their BABIP. It’s just that the spread in talent is small enough that random variation often overshadows the talent. That’s why BABIP is regressed so heavily to the mean. Other skills with wider variation, such as K% or BB%, aren’t regressed so heavily.

FIP is a shortcut wherein you regress BABIP 100% of the way to the mean and you regress K%, BB%, and HR% 0% of the way to the mean. That’s something that people should understand.

stat-addled alien overlord

by salb918 on Apr 22, 2008 6:02 AM PDT to parent up   0 recs

I hope we as a community

Aren’t intimidating people into not asking these questions….I always thought we were pretty willing to help people understand these things without judging them for it….Let’s be honest, we were all ignorant of this statistical rationale at some point in our lives as baseball fans and we shouldn’t be making fun of people for trying to expand their knowledge of the sport.

by OlSalty on Apr 21, 2008 10:18 PM PDT   0 recs

Value Over Replacement Player

Basically when you determine the league average for statistics, it is how a player ranks relative to the middle of the pack.

by OlSalty on Apr 21, 2008 10:22 PM PDT to parent up   0 recs

i understand that...

..but what i’m wondering is what comes into play here? vidro ranks high above the league average DH in terms of AVG but sucks in terms of OPS/SLG. each player has a VORP stat. what comes into play for that stat? is it purely offensively based? is defense included? does the calculation of it differ between positions? is there a VORP for both pitchers and non-pitchers?

br

br

by sirbrianwilson on Apr 21, 2008 10:25 PM PDT to parent up   0 recs

VORP does not consider defense abilities, but does factor in position

Not sure on park. However, the formula itself is proprietary so there’s not much we can say about it. There’s also many issues lingering around the use of “replacement level” in any statistic. Comparing relative to league average is almost always better.

by Matthew on Apr 21, 2008 10:29 PM PDT to parent up   0 recs

so...from both yours and jeff's comments...

...i take away the point that VORP is useless. if it doesn’t factor in defensive stats, then everything is relative. this is the type of stat that i could see putting ibanez in the limelight…

also…jeff mentioned that the formula for calculating “replacement level” is “secret.” wtf?

br

br

by sirbrianwilson on Apr 21, 2008 10:32 PM PDT to parent up   0 recs

it's not useless

There are just better measurements now. VORP was very important when it came out.

by Matthew on Apr 21, 2008 10:34 PM PDT to parent up   0 recs

maybe it's not useless..

...but, as a weathered statistician, i must bring into question when a formula only accounts for half of the equation (just offense, not defense). and what is the calculation for “replace level.”

ha…one thing i love about this new SBN format is that i can see new replies as i’m typing this one. Jeff just commented how it’s not worth the argument since VORP is an old stat. I can understand that.

My new question: What 3 stat categories are the best for evaluating pitching (because i’m sure there isn’t some grand-stat that explains all) and what 3 are most important for all other positions (DH not included).

I’m mainly looking for something, in terms of position players, that displays both offensive and defensive quality/worth…

br

br

by sirbrianwilson on Apr 21, 2008 10:37 PM PDT to parent up   0 recs

*edit

should have written, “I must bring a formula into question when it…”

most apologies.

br

br

by sirbrianwilson on Apr 21, 2008 10:46 PM PDT to parent up   0 recs

Err it deliberately

omits defense because it is meant only to be used as an offensive stat.

Would you say OPS / OPS+/ EQA / Batting Runs / Batting Runs above Average / BaseRuns are all “useless”? They all also omit half of the equation.

ZIPS: Milledge: 466 HR, 485 2B, 2282 hits, 278-379-524

by rfloh on Apr 22, 2008 7:11 AM PDT to parent up   0 recs

EqA is pretty good

other than that? I don’t think so. I’m also biased. (Disclosure: I work for THT)

by Matthew on Apr 21, 2008 10:43 PM PDT to parent up   0 recs

[we know]

Barry Bonds died for your sins.

by JI on Apr 21, 2008 10:47 PM PDT to parent up   0 recs

...PECOTA

duh

Barry Bonds died for your sins.

by JI on Apr 21, 2008 10:48 PM PDT to parent up   0 recs

eh, I suppose

PECOTA is no longer any more accurate than most other good prediction systems out there.

by Matthew on Apr 21, 2008 10:59 PM PDT to parent up   0 recs

sill relevant

but yeah

Barry Bonds died for your sins.

by JI on Apr 21, 2008 11:38 PM PDT to parent up   0 recs

I second this

since wOBA is pretty hard to find, EqA is a good one-number offensive statistic.

by Jeff on Apr 21, 2008 10:49 PM PDT to parent up   0 recs

GPA

GPA is just about as good as EqA or WOBA, and is readily available on the THT site (and it’s adjusted for ballpark). I personally like it better than wOBA because, like EqA, is fits the scale of a batting average, which is more intuitive for most fans than the scale of OBP.

by studes on Apr 22, 2008 7:14 AM PDT to parent up   0 recs

The big difference

How do you convert GPA to runs? I have no idea. Does anyone?

How do you convert wOBA to runs? Divide by 1.15, then multiply by expected plate appearances Easy – I can do that almost in my head, and with a calculator or spreadsheet, it takes three seconds.

Those of us who want THT to display wOBA aren’t just agitating because we want something that is 0.4% more accurate – there’s a purpose to having wOBA published – we know how to turn it into a linear weights run value method, and that’s something that is worth having available to the statistical community. (And yes, I know I can get it from firstinning – thanks Chris! You’re a lifesaver)

by davidcameron on Apr 22, 2008 7:39 AM PDT to parent up   0 recs

You work for THT...

so tell them to publish wOBA, and make a good park adjusted version. I use OPS+ as my go-to single statistic, which is silly because it kinda sucks, but Bpro has a very unwieldy site. It is a travesty that no one publishes wOBA.

The A's colors are green and gold.

by mikeA on Apr 21, 2008 11:38 PM PDT to parent up   0 recs

no one?

FirstInning.com has published wOBA for a while. Not sure if it’s park-adjusted, though

by Bdo on Apr 22, 2008 5:43 AM PDT to parent up   0 recs

Thanks,

hadn’t been going there. Great pitching stats too.

The A's colors are green and gold.

by mikeA on Apr 22, 2008 3:52 PM PDT to parent up   0 recs

Different people have different "replacement level" baselines

I haven’t used VORP in years. At this point explaining its quirks is probably more trouble than it’s worth.

by Jeff on Apr 21, 2008 10:34 PM PDT to parent up   0 recs

I think VORP, when you get down to its core,

is essentially based in linear weights. I could be wrong about that, though.

stat-addled alien overlord

by salb918 on Apr 22, 2008 6:05 AM PDT to parent up   0 recs

Batting Average is useless

OBP and SLG are the only two stats that matter in determining a players offensive production. OPS (combination of OBP and SLG) is all that matters because it determines how often you get on base and how many bases you produce per hit. BA just determines how often you put a ball in play that puts you on the basepaths, and that discounts both extra=base hits and walks.

And yeah, for a specific VORP per position metric, ask Jeff or Matthew or Graham, they know this stuff :)

by OlSalty on Apr 21, 2008 10:30 PM PDT to parent up   0 recs

batting average isn't useless

just vastly overrated

Barry Bonds died for your sins.

by JI on Apr 21, 2008 10:41 PM PDT to parent up   0 recs

yes

but you can make a pretty good argument that Ichiro and Rod Carew were more productive than than you typical .370/.430 guy.

Barry Bonds died for your sins.

by JI on Apr 21, 2008 10:44 PM PDT to parent up   0 recs

true

but .320/.350/.450 is more valuable than .250/.350/.450 both in terms of run production and also in light of the excellent old versus new player skills article Derek posted at USSM today.

by Matthew on Apr 21, 2008 10:44 PM PDT to parent up   0 recs

OPS favors sluggers without OBP skills too much.

.300/.400/.450 is an .850 OPS
.300/.350/.500 is an .850 OPS

The one with the .400 OBP is more valuable.

OBP, SLG, and OPS aren’t the ONLY stats that matter, but they’re closer than a lot of stats people tend to use traditionally

Yesterday's Pants
A blog-thingy about the Mariners and stuff.

by BrettJMiller on Apr 21, 2008 10:43 PM PDT to parent up   0 recs

And it's park factors are too simle

Exhibit A: Safeco Field

There’s no reason to judge Johjima and Ibanez using identical park factors.

Barry Bonds died for your sins.

by JI on Apr 21, 2008 10:45 PM PDT to parent up   0 recs

Exactly, people forget sometimes that in park factors not all players are effected the same way.

Kenji and Raul both play in a pitcher’s park. Safeco actually helps lefties like Raul despite Safeco overall favoring pitchers… It’s a pretty obvious example but it’s a good one.

Yesterday's Pants
A blog-thingy about the Mariners and stuff.

by BrettJMiller on Apr 21, 2008 10:47 PM PDT to parent up   0 recs

Actually there's a very good case for this

Park factors are important for two things: measuring how valuable an individual run produced is, due to run environment, and trying to understand the true skill of a player.

Second requires different park factors by hand, etc, but the first just needs a blanket run factor.

by Graham on Apr 21, 2008 10:55 PM PDT to parent up   0 recs

VORP is an offensive metric that assigns players a certain run value over that which you could expect from a “replacement-level player” (basically the kind of guy you can find hanging around in AAA). It’s weighted for position, so if you have a 1B and a SS who post identical batting lines, the SS will have a higher VORP, because a replacement-level shortstop will be worse at the plate than a replacement-level first baseman.

It’s a complicated, secret formula that…I’ll be honest, not too many people use it anymore. It’s almost completely useless for pitchers, and for hitters, you’re better off looking at simpler, more intuitive stuff like BA/OBP/SLG and line drive rate and all that good stuff.

by Jeff on Apr 21, 2008 10:28 PM PDT to parent up   0 recs

Edit:

VORP is an offensive metric for hitters and, obviously, a pitching metric for pitchers. But you should never look at VORP for pitchers anyway.

by Jeff on Apr 21, 2008 10:29 PM PDT to parent up   0 recs

It's essentially the following formula

(Player runs created per plate appearance)-(position replacement level RC/PA)*PA

Replacement level is defined by a certain % of league average, depending on the position. VORP is obviously more complex than this, but the above is a pretty good approximation.

by Graham on Apr 21, 2008 10:58 PM PDT to parent up   0 recs

Also

I’m not very clear on the popular fielding metrics being used. I hear things like RZR being tossed around, but I’ve got no idea what that means and how it’s used to evaluate a defender.

In the first diary I opened up here, I asked for some information on former and current Mariner fielders. I think Jeff tossed out that Lopez was a 0 < X < 10 defender.

I have absolutely no idea what that means.

by BrianL on Apr 21, 2008 10:22 PM PDT   0 recs

RZR = revised zone rating

available at THT. You’ll also see UZR (ultimate zone rating, developed by MGL) and PMR (probabilistic model of range developed by David Pinto). They all try to measure the same thing in roughly the same way, none of them are that robust in figuring out an individual fielder’s actual value.

There’s also SAFE (spatial aggregate fielding evaluation developed by a team of Penn Statistics researchers [disclaimer: I was one of them]) that does a better job than any of the above three, but requires data from BIS and thus it’s only available for 2003-6.

As far as fielding evaluation, we’re pretty good at figuring out how a team as a whole ranks, and can do a decent job at differentiating between infield defense and outfield defense as unit, but getting to individual players requires at least 3 years of rankings from all the above to have any idea.

0 < x < 10 means that Jeff believes Jose Lopez’s defense is worth between 0 and +10 runs above average for a 2B.

by Matthew on Apr 21, 2008 10:42 PM PDT to parent up   0 recs

If HR/FB% is out of a pitcher's control,

what’s the point of ever looking at FIP when xFIP exists?

by naviomelo on Apr 21, 2008 10:22 PM PDT   0 recs

FIP is better for relievers, for whom the HR/FB% "rule" doesn't really apply

and a lot of people still aren’t entirely convinced that pitchers fluctuate around an HR/FB of 11% anyway, so they have trouble stomaching xFIP. FIP isn’t bad, as long as you keep an eye on the home run rate to make sure it isn’t wonky/unsustainable.

by Jeff on Apr 21, 2008 10:24 PM PDT to parent up   0 recs

Addendum

while HR/FB% is at least mostly out of a pitcher’s control, it is still heavily influenced by the park so while xFIP might be better for figuring out how a pitcher might produce in a neutral environment, FIP might be better for evaluating a pitcher going forward on said team.

by Matthew on Apr 21, 2008 10:28 PM PDT to parent up   0 recs

home park is included

xFIP is adjusted to the pitcher’s home park.

by studes on Apr 22, 2008 7:16 AM PDT to parent up   0 recs

right of course.

Why do I always forget that?

Any thoughts on modifying xFIP to regress HR/FB (adjusted for home park) say 80% towards league average instead of 100% since we know there is a little bit of pitcher control over it?

by Matthew on Apr 22, 2008 9:33 AM PDT to parent up   0 recs

Well, not really

To me, there has to come a point where you don’t turn a stat into a projection system. Even though I invented it, I’m not the biggest fan of xFIP, for just that reason. I like either a simple stat, or a full-blown projection system.

by studes on Apr 22, 2008 9:43 AM PDT to parent up   0 recs

Because it is in dispute just how much

“out of a pitcher’s control it is.”

MGL and Tango in response to Dave Cameron’s article on pitching components:


The author also says that there is little (or was it "no”) evidence that a pitcher has any control over the percentage of fly balls that are home runs. That is completely false. Pitchers very much have control over the distances of their fly balls and hence the percentage of them that go for home runs. This is easy to verify in a numbers of ways of course. I don’t know where this idea came from that pitchers have little control over HR/HR, but it seems to be "going around" in some ssabermetric circles.
I think there is a misconception that pitchers have "little or no" control over those HR/FB and that a pitcher’s true HR rate is almost compeltely a function of his FB rate. That is simply not true. In fact, in the research I have done, there is an indication that a pitcher’s true HR rate is very much independent of his FB rate. Or at least as important.
Here is some data on pitchers in 04 and 05. I ran a regression of HR/FB in 04 on HR/FB in 05 for pitchers with at least 500 TBF per year. The average TBF per year was 772 and the average FB was 209. A FB was any air ball, not including line drives (according to STATS).

"r" was .232.

If I only use outfield fly balls, which is defined as all air balls, but not line drives (again, according to STATS), to only the OF locations (an average of 176 OF flies in the 772 TBF), I get an "r" of .222.

The number of pitchers in the sample is 88.

I I decrease the min TBF to only 200 (an average of 530 TBF), I get an "r" of .081 for 229 pitchers.

That sounds like what THT got. We need a larger sample to decrease the uncertainty of these "r’s".

If we increase the sample to include 98 on 99, 00 on 01, and 02 on 03, we get 890 pitchers with at least 200 TBF per year with an "r" of .181 (rather than the .08 with a smaller sample).

For pitchers with TBF greater than 499, we have 317 pitchers with and "r" of .190 (rather than the previous .232 with the smaller sample.

Nothing is park adjusted.



I re-ran the correlations for players who switched teams from one year to the other, in order to make sure that the correlation was not being significantly influenced by the park HR factor.

For 94-05 (regressing 94 on 95, 96 on 97, etc.) data, there were 85 pitchers who had at least 500 TBF in each of two consecutive years and switched teams. The "r" was .203, so the suggestion is that the "r’s" that we are getting are NOT due to the parks only or even mainly.

In MGL’s first study, he did FB=209, r=.232

This gives us an "x" for the equation of x / (x+FB) of 692.

In this case, 692 FB corresponds to about 2500 TBF.

In his next study, he did FB=176, r=.222 . The "x" value is 617. TBF corresponds to about 2700.

In his next study, he did r=.081, with an unknown number of FB, but which I will guess is FB=120. The "x" value is 1361.

Now, there is a danger in how you do a correlation, if you don’t weight each sample appropriately. This is why with a straight regression, you want your samples to have a similar number of "n", amongst themselves, and in the paired sample. There is a better way to do it otherwise, but more complicated.

Looking at the first two studies, we see that r=.50, when TBF is around 2500 or so, meaning about 580 IP.

In may last sample of players who switched teams, the average TBF was 746 and the average FB was 197. "R" was .205 for 85 pitchers.

In the larger sample of all pitchers, the "r" was .190 and the sample size was 407, the average TBF was 760 and the average fb was 196.

I would use the "r" and the other data from the second sample, since it is much larger and the uncertainty from the "r" is much smaller.

There’s also some other stuff, specifically #30, where he shows the relationship between GB% and HR / FB% is weak.

ZIPS: Milledge: 466 HR, 485 2B, 2282 hits, 278-379-524

by rfloh on Apr 22, 2008 7:33 AM PDT to parent up   1 recs

Yeah, I have single season r at 0.21 or so for HR/(Ball in air)

But that’s by far the weakest year to year relationship for any of the 7 major measurables (even lower than HBP%!)

by Graham on Apr 22, 2008 7:38 AM PDT to parent up   0 recs

covered

This was covered in both the ‘06 and ‘07 THT Annuals. Both the binomial and year-to-year correlations were between .1 and .2. Pretty weak.

by studes on Apr 22, 2008 8:03 AM PDT to parent up   0 recs

What are the best metrics for hitters? Pitchers?

WPA? VORP?

FIP?

We don't negotiate with terrorists.

by Mariner John on Apr 21, 2008 10:28 PM PDT   0 recs

WPA shows how much they contributed regarding situation...

Just off hand I’d assume that WPA isn’t very useful for evaluating a player’s skill. It’s all about results in certain situations and results-based analysis is not really a good way to evaluate a player.

Yesterday's Pants
A blog-thingy about the Mariners and stuff.

by BrettJMiller on Apr 21, 2008 10:29 PM PDT to parent up   0 recs

For pitchers, look at tRA (ask Graham), FIP, xFIP, or simpler stuff like strikeout/walk/GB rates

For hitters, the best stuff is probably tango’s linear weights, but since that method is complicated and not really available anywhere, you’re okay just looking at the raw batting line, adjusting for park, and checking to see that the BABIP is sustainable.

by Jeff on Apr 21, 2008 10:33 PM PDT to parent up   0 recs

Batting Runs on BBRef

is a linear weights measure, park adjusted, but without SBs and CS,

For many players, exceptions being guys like Ichiro, Reyes etc, it should work fine.

ZIPS: Milledge: 466 HR, 485 2B, 2282 hits, 278-379-524

by rfloh on Apr 22, 2008 7:35 AM PDT to parent up   0 recs