Navigation: Jump to content areas:


Pro Quality. Fan Perspective.
Login-facebook
Around SBN: How A Letter From Tom Coughlin Helped One Fan's Recovery

2009 Open Statistical Analysis Question Thread

EDIT(11:20AM): For anyone new to LL or advanced analysis, this is the perfect time to ask any question, no matter how basic. I can't stress this strongly enough.

We've discussed some more advanced nuts&bolts-type stuff in the comments, but even if you have a question like "what are tRA and wOBA and why do you use them instead of OPS and FIP," ask ask ask!

We had one of these last year, and it seemed to work out pretty well to get people's questions, however basic answered.

I think it would be a good idea to do this again this year, esepcially with the influx of new posters to Lookout Landing, it would be nice to have a place where you can ask any advanced analysis question and have it answered by someone with great knowledge of this area.

Jeff, Matthew, and Graham have been great at answering questions, and another thing I'm hoping will happen here is that other knowledgeable non-mod posters will be able to contribute and help out as well. I know that at least for me personally, I've learned a ton just in the last year, to the point where I'd be comfortable fielding at least basic questions regarding advanced analysis and metrics.

So feel free to ask any question and we'll do our best to answer them. I'd ask that the regulars really take it easy on everyone in this thread and just do your best to help the asker understand, no matter how basic the query.

Of course this doesn't mean that I don't have my own questions that I'd like to have answered, and honestly, part of this thread was me being selfish enough to make a FanPost hoping to get a few answers out of everyone.

- How does pitching tRA convert to runs above/below average? Is it just a simple [(tRAplayer-tRAleague)*xO]/27?

- What's the definition of replacement-level for a pitcher? For a hitter, we know it's ~20 runs per 600 PA or like .033 runs/PA. Is there a similar sliding scale for pitchers? Is it different for relievers/starters? And what would be that number, if there is one.

- It's my understanding that positional adjustments are defined so that an average hitter at a position in relation to his peers who is exactly an average fielder will always be worth 2 WAR. Why do we value positions like C and SS over corner OF and 1B? Shouldn't a 2 WAR player be a 2 WAR player, regardless of position? I understand scarcity, but due to positional adjustment, shouldn't a 2 WAR catcher be just as available as a 2 WAR LF?

- How does positional adjustment work if a guy plays a lot of positions? Does he just get each positional adjustment multiplied by percentage of playing time at a given spot?

- Are positional adjustments based on ~600 PA? Do you get a bigger adjustment if you play, say CF, and get 700 PAs in a season?

Also, when this is all said and done, I'll try to compile the answers and post them in a new FanPost so we'll have a lot of good information stuck in one spot.

Comment 129 comments  |  6 recs  | 

Do you like this story?

Comments

Display:

Does Statcorner have pitching tRA for each individual game?

I was just wondering how Felix looked according to tRA last night. Lots of swinging strikes and K’s but also lots of hard hit balls.

I’ve been watching on the site how each game has been affecting our pitcher’s tRA’s, but just hoping that game by game data might be available.

by Sec 108 on May 5, 2009 10:23 AM PDT reply actions  

5.36 tRA for the game.

It would be lower if tRA took individual pitch results (swinging strikes, etc) into account. 2/3rds strikes and a nearly 20% missed bat rate is just insanely good.

by Matthew on May 5, 2009 12:38 PM PDT up reply actions  

Speaking of tRA, will there be an update to the formula based on 2009 BIP data?

does the formula continually update, and how far back do you use data from? Is it just some sort of sliding 10-20-30-40 scale or whatever to account for potential run scoring changes?

by seattlebruin on May 5, 2009 11:06 AM PDT up reply actions  

It keeps each season's BIP run values separate, but they change extremely little year to year.

2009 will be updated relatively soon. Right now, it is still using mostly 2008.

by Matthew on May 5, 2009 11:13 AM PDT up reply actions  

Non-unrelated to that...

I’ve noticed a number of times guys make reference to the tRA for a single-game performance. But I’d trained myself (rightly or wrongly) to read very little into single game rates – it seems obvious that in a single game a guy’s OPS can fly from one extreme to the other, and I wouldn’t read so much into Bedard posting 3 walks and only 5 strikeouts in a single game so long as it was a one-off.

I suppose that my question, then, is whether single game tRA in isolation is any more reliable (relative to season-long tRA) than single game figures for other metrics, and if so what it is that makes it so. I have no reason to doubt that it is, but don’t have the confidence in my own understanding to explain why/how if asked.

Ta!

by MarkE on May 5, 2009 12:33 PM PDT up reply actions  

It's more reliable than say single game ERA

but it’s still a very small sample. Generally, I just use it to get an eyeball on the game from a 30,000 point view.

by Matthew on May 5, 2009 12:36 PM PDT up reply actions  

Okay, thanks

That helps – I figured there was a massive trick I was missing.

Hmmm… I think that’s got another question brewing in my head about tRA variation game-to-game, but I guess that’s partially covered by your earlier response to Sec 108 and I’ll hold fire.

by MarkE on May 5, 2009 12:39 PM PDT up reply actions  

If you want to evaluate a single game

look at Strike%, SwStrike%, and the BIP distribution. That tells you pretty much everything you need to know.

by Jeff Sullivan on May 5, 2009 12:50 PM PDT up reply actions  

More general than specific

This time of year, people always use the “small sample size” caveat when talking about performance. What is considered an acceptable sample size for pitchers? For batters? For fielders?

Nice Guys Finish Third - Hopelessly lost, but makin' good time.

by pdb on May 5, 2009 10:32 AM PDT reply actions  

Depends on the statistic you are considering.

Which is why I get frustrated when people just blindly shout SSS to early season analysis. Shades of gray. Always shades of gray.

by Matthew on May 5, 2009 10:50 AM PDT up reply actions  

Are those per-stat parameters defined on StatCorner or elsewhere?

because for a relative novice like myself it’s good to know more than “65 PA is too small”, when trying to interpret data, for instance.

(And no, I’m not trying to corner you into defining acceptable sample sizes per stat, I’m just wondering if that has been defined anywhere that I can look up)

Nice Guys Finish Third - Hopelessly lost, but makin' good time.

by pdb on May 5, 2009 10:56 AM PDT up reply actions  

Not really.

Basically, it’s more about understanding what the actual sample is. For instance, I can talk about Felix’s swinging strike rate after five starts and instead of decrying that it’s a sample size of just five starts, you need to realize that since I’m referring to an individual pitch statistic, the actual sample size is ~500.

by Matthew on May 5, 2009 11:04 AM PDT up reply actions  

Makes sense, thanks

Nice Guys Finish Third - Hopelessly lost, but makin' good time.

by pdb on May 5, 2009 11:07 AM PDT up reply actions  

Usually you determine needed sample size by taking past examples and seeing at what point in the season equal the final final value

ex. A batter ends the season at .300. At what point during the season in the batter at/near that .300 average. Sometimes this overlaps seasons and then the math gets a little fuzzy do to aging effects.

by Jeff Zimmerman on May 5, 2009 1:23 PM PDT up reply actions  

What?

This makes little sense to me.

by Matthew on May 5, 2009 3:10 PM PDT up reply actions  

I'm glad I'm not the only one

but I don’t know enough about statistics (the mathematical science, not the baseball numbers) to know whether it was right or not. Didn’t seem too likely though.

Nice Guys Finish Third - Hopelessly lost, but makin' good time.

by pdb on May 5, 2009 3:15 PM PDT up reply actions  

In regards to sample size,

I can’t find the link the original article but the bottom of this link summarizes the findings of the article I read a while ago that I liked a lot

What Pizza Cutter does is find out at which point the r-squared for different statistics reaches the .50 barrier, the point at which the majority of the variation of a stat is predictable.

Hitters

Strikeout rate/Contact rate*: 150 PA
LD%: 150 PA
Walk rate: 200 PA
GB%: 200 PA
GB/FB: 200 PA
FB%: 250 PA
Home run rate: 300 PA
HR/FB: 300 PA
BABIP: Doesn’t reach a 0.50 r-squared at 650 or below.
Batting average: Doesn’t reach a 0.50 r-squared at 650 or below. Pizza Cutter guesses it would at around 1000 PA.
*Note: Pizza Cutter also lists a stat called "contact rate," which stabilizes at 100 PA, but this is different than the contact rate that we use. I believe this refers to contact rate on a per-pitch basis as opposed to a per-at-bat basis.

Pitchers

K/PA: 150 BF
GB%: 150 BF
LD%: 150 BF
FB%: 200 BF
GB/FB: 200 BF
K/BB: 500 BF
IF FB%: 500 BF
BB/PA: 550 BF
BABIP: Doesn’t reach a 0.50 r-squared at 650 or below.
HR/FB: Doesn’t reach a 0.50 r-squared at 650 or below.

by Sokojoe on May 5, 2009 1:58 PM PDT up reply actions  

It’s interesting that K/PA stabilizes so much more quickly than BB/PA. Is that correct? I thought K%, BB%, and GB% were all stats that stabilized quickly and in a similar time frame.

When I was a kid I used to pray every night for a new bicycle. Then I realized God doesn’t work that way, so I stole one and prayed for forgiveness. - Emo Philips

Proud father of Juan Carlos Perez. Think Albert Pujols at second.

by marcello on May 5, 2009 4:28 PM PDT up reply actions  

Acceptable for what?

It is impossible to accumulate enough PAs in a single season to be 100% – or heck, even 80% sure of a player’s true talent level. I think for a full-time position player a full season’s PAs is enough to get you to about 60% sure.

Everything in baseball is a sample. All samples are by definition uncertain.

by cwyers on May 5, 2009 9:54 PM PDT up reply actions  

From the SC glossary!
Using tRA as the benchmark, the formula is (lgTRA * xOuts / 27) – xRuns.

-The definition of replacement level for pitching came from Tango on the Book Blog somewhere. I cannot find the link right now.

-Can you re-word this question or break it into multiple questions? I’m not sure what you are asking.

-Yes. Positional adjustments are per a full season of play, so multiple positions are broken down by % of playing time there.

-pos adj are based on games played I believe, not PAs. Replacement level, however, is based on PAs, I think about 650.

by Matthew on May 5, 2009 10:49 AM PDT reply actions  

I'm confused as to why it seems like we'd rather have an average C prospect than an average 1B prospect, say

obviously, we look for a bat at an unusual position (Jeff Clement at C, for example), but it seems as if a projected 3 WAR C would be more valuable than a projected 3 WAR 1B.

Obviously, ~league average bat at C will be worth a little over 3 WAR, but it seems to me that finding a C worth 3 WAR with his bat and a 1B worth 3 WAR should be equally difficult to find. In that case, why does it seem as if the C carries more value?

I’m having a really tough time wording this question – if you still have no idea what I’m talking about, I’ll try again later when my thoughts will hopefully be more coherent =(

by seattlebruin on May 5, 2009 10:58 AM PDT up reply actions  

What you are hinting at has to do with scarcity.

To rephrase you more simply, does WAR take into account scarcity? Is it just as hard to find a 3 WAR catcher as it is to find a 3 WAR 1B?

by abender20 on May 5, 2009 10:59 AM PDT up reply actions  

YES, exactly

thanks, and the follow-up would be “why – shouldn’t positional adjustments make scarcity a non-factor?”

by seattlebruin on May 5, 2009 11:01 AM PDT up reply actions  

I'm glad you asked this question, because I was asking that myself.

If positional adjustments don’t properly account for scarcity, shouldn’t it be relatively easy to scale WAR based on some sort of positional WAR+ to take that into account?

by abender20 on May 5, 2009 11:03 AM PDT up reply actions  

Is wOBA essentially the batting side of tRA?

Because when I read the explanations for both, they seem to be fairly similar. However, it may just be my general unfamiliarity with the systems.

Also, if they are indeed fair equivalents, can we figure some form of comparison?

And finally, sorry for three, are we able to calculate wOBA against (or would that pretty much be tRA?)?

by Robert Lintott on May 5, 2009 10:51 AM PDT reply actions  

Not really, no.

tRA takes the batted ball types into account, not the actual results of the batted balls (e.g. single, fly out, etc). wOBA is based off the actual results of the balls in play.

by Matthew on May 5, 2009 10:53 AM PDT up reply actions  

So what do we have that is a good predictive offensive stat?

BABIP seems like it might work well, based merely on the fact that “x is/is not sustainable” is a fairly safe claim to make. I know that OPS didn’t place enough value on OBP, but would it be predictive in that same vein?

by Robert Lintott on May 5, 2009 11:28 AM PDT up reply actions  

The value of an IBB is 0...

…relative to the average PA. In other words, for the average hitter, the value of the IBB is roughly .12 runs, because that’s the average value of a plate appearance. That’s how Tango (and Fangraphs) figures wOBA; I can’t speak for Statcorner.

by cwyers on May 5, 2009 9:56 PM PDT up reply actions  

It has to do with a conversation

where we discussed that ignoring IBBs altogether was a weakness of wOBA, because not all IBBs are equal, some are random (because the player hits in front of the pitcher or is walked to set up a DP etc) while others a drawn because the opposition is terrified to pitch to the batter. The latter is a skill and should be reflected as so, but it’d be a pain in the ass to figure out that shit longhand and the gain would be minimal.

by JI on May 6, 2009 9:31 AM PDT up reply actions  

Okay.

To convert wOBA to R/PA you use:

(wOBA-lgwOBA)/1.15 + lgRPA

The 1.15 constant can shift around a little bit.

IOW, wOBA is simply R/PA scaled to look like OBP.

The run value of an IBB is identicaly to the R/PA for that batter. (This is actually a useful oversimplification, but it’ll do – the IBB is traditionally issued in circumstances where the runs-per-win value of the PA in question is substantially different than normal. So you have to “correct” the run value of the IBB if you want to compare it to the run value of other stats.)

So if wOBA is essentially runs minus IBB runs divided by PA minus IBBs, what you end up with is a player’s R/PA (scaled).

Let’s look at Bonds in 2004, where he had a wOBA of .538, league average of .329, scaling factor of 1.18:

(.538-.329)/1.18=.18

So then each of Bonds’ IBBs would be rated at .18 above average, for a total of 21.6 runs above average contributed to his team by his IBBs alone.

Let’s look at Ichiro in 2004 as well – he was third on the IBB leaderboard that season, with 19. He had a wOBA of .379. So:

(.379-.329)/1.18=.04

So that gives us .04 runs per IBB, or 19*.04=0.76 runs contributed to his team above average by the IBB.

wOBA does credit a hitter for the value created by his IBB, in proportion to the value of the IBB for that hitter.

by cwyers on May 6, 2009 10:15 AM PDT up reply actions  

Thanks Matthew

I’m trying to update. I thought I was doing so well when I used OPS. And then, here comes wOBA. The shift is slow…

by Robert Lintott on May 5, 2009 11:53 AM PDT up reply actions  

prOPS is the batting side of tRA

It’s problematic, though. tRA basically assumes that the average run value of a ld/gb/etc will be the same for each pitcher, which is fair given the sample size. It’s clearly not, the same, however, for each hitter, because some are fast and some are slow and some are weak and some are strong, and there’s no way to even that out over the course of a year.

by Graham MacAree on May 5, 2009 11:47 AM PDT up reply actions  

So to get something like prOPS working, you'd essentially group players by profile to

get a large enough pool of current data with which to approximate run values for batted balls, as hitter profiles would shift over time as well. That’s ugly.

What is the basis behind tOPS?

by abender20 on May 5, 2009 11:50 AM PDT up reply actions  

I hadn't thought about that problem, but it makes sense.

But, given the fast/slow/strong/weak problem, why doesn’t that get settled in 2B, 3B, etc. It seems that if you hit a double, you are either quite fast, or quite strong on your (presumable) line drive. Shouldn’t the results tell us a lot of this? I have to be missing something.

by Robert Lintott on May 5, 2009 11:55 AM PDT up reply actions  

And we'd get Adrian Beltre looking pretty bad last year...

Yeah, that makes sense. Though I feel there ought to be a way to merge the two. Somehow combine the BABIP or batted balls with results. Probably way too much work, but something we could see in the future?

by Robert Lintott on May 5, 2009 11:58 AM PDT up reply actions  

'Regress rv/batted ball heavily to each player's 3 year average'

That’s the combination there. Past RB/batted ball depends on past results, with a big enough sample that you can be fairly sure things aren’t just luck. There’s some noise in the system but it’s much better than making no correction at all.

by Graham MacAree on May 5, 2009 12:00 PM PDT up reply actions  

Would you get problems with players who just hit the wall?

I know it’s pitching, but Silva comes to mind, where he had a few decent/average years, and then just showed his true colors (clear, from the grease).

by Robert Lintott on May 5, 2009 12:02 PM PDT up reply actions  

Yes but most players who 'hit the wall' tend to be unlucky in conjuction with their skills declining

So it makes sense to cut them a bit of a break. Plus the run values would be updating with each PA, so it’d figure out what’s what eventually.

by Graham MacAree on May 5, 2009 12:03 PM PDT up reply actions  

And how long do you think it is until we get this?

Or is it tOPS? Because it sounds amazingly helpful, even when statistical noise is accounted for.

by Robert Lintott on May 5, 2009 12:05 PM PDT up reply actions  

I bet I could do it fairly quickly.

I just don’t have much incentive right now.

by Matthew on May 5, 2009 12:06 PM PDT up reply actions  

Well then, thanks for the info you two.

Good to know the stuff is out there. Also good to know that soon my wOBA will be as out-of-date as my OPS. Wheeeee

by Robert Lintott on May 5, 2009 12:10 PM PDT up reply actions  

Have you noticed any trends in LD rates, either by park or by year?

Using batted ball types to measure pitching effectiveness makes six kinds of sense of course, but it seems the definition of a line drive is still sort of nebulous. Thus, while we understand ERA can be affected by the occasional judgment call of the official scorer, a pitcher’s batted ball profile (or a hitter’s) can be affected by several judgment calls each game.
It’s also the sort of thing that really should even out over the course of a year, but you could still have a scorer at one park call more or less line drives than another. Have you checked the MLB data against BIS and/or STATS? Does MLB do this? Is it BIS or STATS that’s got the hybrid category of fliners?

by marc w on May 5, 2009 11:12 AM PDT reply actions  

I would like to look at a comparison of data but I do not have access to BIS or STATS

as those cost insane amounts of money.

I could still do a comparison study of batted ball rates separated by park and compare players home and away and get some sort of answer on seeing if there’s considerable scoring bias.

by Matthew on May 5, 2009 11:15 AM PDT up reply actions  

From Dave's explanation at FG
- Are positional adjustments based on ~600 PA? Do you get a bigger adjustment if you play, say CF, and get 700 PAs in a season?
The position adjustments are then scaled to match the games played at each position for a particular player. This way, players that spend time at multiple positions get a hybrid adjustment based on their playing time at the respective spots.

http://www.fangraphs.com/blogs/index.php/explaining-win-values-part-three

by R.J. Anderson on May 5, 2009 11:12 AM PDT reply actions  

I have to say, that this is one of my favorite parts of the site

If you’re willing to put just a modicum of effort into it, you will not find more helpful stat-gurus anywhere. It really is awesome, and it makes watching/following baseball even better.

by Robert Lintott on May 5, 2009 12:08 PM PDT reply actions  

Completely agree

I kind of alluded to it the other day, but as someone who had never (and still hasn’t) thrown a baseball or swung a bat, LL has an amazing level of accessibility to a complete novice with a pinch of numerical appreciation, and has helped me learn more about the game than any amount of hours watching MLB.tv.

I honestly don’t think I’d be half as in to this game if it wasn’t for this place, so thanks to J/M/G (and the other contributors/commenters) for providing a great platform to learn from.

by MarkE on May 5, 2009 12:47 PM PDT up reply actions  

Where's the best place to track down pitches/plate appearance data?

I can’t find it at Fangraphs (boy is that something that should be on the Plate Discipline section), and I broke Statcorner trying to bring up the Batter Batted Ball sheets. Some script went keeps going awry and that’s not working out. Where else is this available?

by abender20 on May 5, 2009 9:00 PM PDT reply actions  

Hit f/x

When hit f/x arrives, we will be able to get information regarding the trajectory and speed of batted balls, correct? Once this information is available, will wOBA kind of be secondary in terms of evaluating hitters? Also, will it upgrade the accuracy of a BIP profile for a pitcher and therefore improve tRA?

by borgy on May 8, 2009 9:26 PM PDT reply actions  

Ok, I've got a real basic one. I've seen the Win Expectancy Charts on Fangraphs and have read

The Book, but I don’t understand the basis of the data on the charts. What year(s) are they based on? Do they assume the teams are exactly average MLB teams, do they make any adjustments for park? I’m just curious as to what they take into account.

Waiting to spit out the "Doublemint Twins".

by Sinking Away on May 11, 2009 7:46 PM PDT reply actions  

The genesis of the charts was looking at games from the Retrosheet era (roughly 1950-present day)

Since then, they have been mostly replaced with figures generated out of complex statistical models. They assume teams are average. They make adjustments for park.

by Matthew on May 11, 2009 10:11 PM PDT up reply actions  

That answers that. Thank you.

Waiting to spit out the "Doublemint Twins".

by Sinking Away on May 12, 2009 8:57 PM PDT up reply actions  

How much should I trust MILB BABIP and LD %?

Considering how erratic the method for recording minor league LD % is, how much would you trust a BABIP without proper context. I am of the belief that due to the greater disparity of talent levels in the minor leagues, it is very hard to trust any sort of batted ball data.

Also, I recently read a rather intense conversation on the Out of the Park Baseball forums about DIPS and what constitues as an outcome under the control of a pitcher or fielder. I am interested in what is generally considered a pitcher controlled outcomes, Infield Pop up % or Line Drive % etc…?.

Oh one more thing, for HR/FB %, is there a rate that it stabilises for pitchers or does it vary for each one.

by tdot mariner fan on May 13, 2009 1:38 PM PDT reply actions  

HR/FB% is pretty much static for all pitchers after park adjustment year to year

At a rate of somewhere around 11-12% of fly balls leaving the yard. I don’t know the actual league average on that this year, but it’s probably close to that.

From my understanding of it, you can usually expect an MLB hitter to stay around their individual career BABIP from year to year, but for a minor league player it’s hard to say as you probably haven’t got a lot of good data on them yet and they’re still developing, plus the problems with data recording you mentioned it’s hard to say just how reliable those numbers are.

GB%, FB%, and LD% are all under a pitchers control.

by OlSalty on May 14, 2009 1:23 AM PDT up reply actions  

Don't trust MiLB LD% and don't take BABIP as seriously as you would in the Majors

A pitcher has huge control over things like strikeouts, walks, and groundballs, and less (but some) control over line drives and pop-ups. tRA takes into consideration all the factors that we think pitchers have some element of control over.

HR/FB usually stabilizes around 10-11% for starters and 9-10% for relievers.

by Jeff Sullivan on May 14, 2009 9:59 AM PDT up reply actions  

Can you not use multi-year BABIP numbers

to stretch tRA* over a multiple year sample?

Matt Murton status: Freed
Garrett Atkins status: Not Traded
Clint Hurdle status: Still Employed by the Rockies

by Andrew Martin on May 18, 2009 1:56 PM PDT reply actions  

Comments For This Post Are Closed


User Tools

By reading a game thread of your own volition you agree to accept all liability for any and all damage done to your delicate sensibilities.

FanPosts

Community blog posts and discussion.

Recommended FanPosts

Small
Seattle Mariners Organization and Minors

Recent FanPosts

Small
Who will have a better season?
Claw_small
BA's Top 10 M's Prospects
Wbc_029_small
Friday Morning Music Thread
Small
Munenori Kawasaki Predictions!!!
Small
The Longevity and Future Success of Felix Hernandez.
Small
The present vs future conundrum
Small
2012 Seattle Mariners: Playoff Team
Smell-the-glove_small
OT 1/24/12 - How Do You Survive Winter?
Small
That extra 2%

+ New FanPost All FanPosts >


Sexy People

Wbc_029_small Jeff Sullivan

Small Matthew