Theo sounds like Jack in a confiding mood
I ran across this link to an interview with Theo Epstein, and I'm still getting used to this strange resulting feeling. Not so long ago I would have been frustrated by the contrast in flexibility and commonsense that it presented with our own asshat operation. Instead, the line of thinking feels vaguely familiar.
The big contrast for me is a surprising level of candor on Epstein's part. Some of the questions try to get at team sore spots or probe relationships with players, and his answers are forthright while still being respectful of everyone who has played for him. The amazing thing is that he's so candid when he has to deal with that breathless pack of Soxnuts.
Jack Z has generally struck me as being pretty careful not to tip his hand too much in operations. I certainly respect that, but I hope over time he grants us a little better window into his mindset. Imagine getting a peek at his thought process in working Yuni out the door!
Added note: I'm curious about the proprietary fielding evaluation system they talk about. I assume it's more than simply handling the public data in a different way. It's conceivable that they'd have dedicated personnel or equipment for recording batted-ball types and zones. It'd be great to be able to compare the systems. Who else besides Ellsbury would turn up contrasting results?
56 comments
|
0 recs |
Do you like this story?
Comments
.
I’m curious about the proprietary fielding evaluation system they talk about
Carmine you mean? I hear she’s hot.
by Eyeball Kid on Feb 26, 2010 2:29 PM PST reply actions 2 recs
She's got a great adam's apple
Juan Carlos Perez, please start hitting.
by marcello on Feb 26, 2010 2:34 PM PST up reply actions 2 recs
Wasn't Carmine the drummer in Cactus?
That’s right, I just made a Cactus reference for no reason I can explain.
I like using semi-colons; they make me feel smart.
As discussed at the Book Blog, it seems odd that their system
could project him as above average when every other defensive metric has him between below average and ghastly. This isn’t one of those situations where Ichiro would pop up as crappy on one, but great in the others. There’s a great deal of consistency regarding Mr. Ellsbury.
I haven't read the discussion at the Book Blog, but...
can it really be that odd? Ellsbury was a +16.5 in 2008 and then a -18.6 in 2009. Just seems really fishy, especially when the numbers don’t line up at all with the scouting reports.
Stop The Wave!
by ConorGlassey on Feb 26, 2010 5:33 PM PST up reply actions
It's not inherently fishy
People are just grabbing a tiny sample and running with it
by Graham MacAree on Feb 26, 2010 7:14 PM PST up reply actions
Just how tiny
is a single season of fielding data? It’s a semi-rhetorical question though the analysis about how quickly fielding data stabilizes is readily available. What have I heard, minimum of three years to form a firm judgment?
My intuition stubbornly tells me that fielding data would be more toward the stable end of the spectrum. Is the dividing line between exceptional and awful so thin that a player can bounce back and forth between them without raising questions about measurement?
I confess I’m feeling philosophical about the line between greatness and mediocrity. I just watched the Tyson documentary and am still marveling at how bad the dude was until suddenly he wasn’t.
They're not really bouncing back and forth between exceptional and awful, that's just the statistical noise from your sample being so small
Like a hitter isn’t really going from being the worst hitter of all time to one of the best because he struck out in his first major league at bat and hit a homerun in his second.
A “season” is not really an acceptable benchmark for UZR because there are fewer data points for that same stretch of time versus, say, hitting. So just like you can’t judge a player based off of two at bats, you can’t really extract a great deal of meaning from a single season of UZR, at least without regressing it towards average according to the size of your sample.
by OlSalty on Feb 27, 2010 1:25 AM PST up reply actions 1 recs
Statistical noise comes from different places
I understand that small samples can create misleading numbers. My point is related to what constitutes an adequate sample for assessing fielding—-the cut-off for adequate sample isn’t solely a matter of the number of data points, but also how varied those points are. Ellsbury had 362 putouts and assists in CF last year out of I-don’t-know-how-many total opportunities. Fangraphs says his expected outs was 393, implying that he sucked more than most CF on ~30 plays. That’s plenty of data points to begin to draw conclusions unless the data are quite noisy. We know that UZR bounces around so much that we start to feel confident when we have three years’ worth of data, but it just doesn’t seem that it should take that long.
I’m saying that my intuition is that fielding is a skill that would reveal itself in a full season of proper data. There would naturally be some variation in how often the ball falls just a foot too far to reach, but overall it seems the skill would reveal itself fairly quickly. I’d think the data noise must stem greatly from the measurement. Field F/X is going to be a feast.
How they instruct and train the UZR field people would be interesting to know.
Data collectors, stringers? I’m at a loss. Anyway. Their standards and training, how they limit bias on the input end and create some level of consistency between teams/games/leagues.
This is kind of extreme, but what happens if the people watching the plays are fans? For instance, Raul Ibanez had a specific, well known reputation for defense in Seattle with a very vocal, insistent fan base promoting that image. He signs in Philly and it’s a popular signing, maybe more for his bat like Gold Glove awards seem to go to the better hitters at a position. So if the stringers are a part of the local fan base, will that influence how they score a play?
I don’t even know how they collect the data, so this is all conjecture. But I do like the idea of the Dewan +/- system, from what I’ve read every play is centrally reviewed. So if their is bias then perhaps that mitigates the effect, or at least makes it consistent.
The sample size issue is a lot more complicated than just the number of putouts
I don’t think this will work without explaining exactly how UZR works, and I’m probably going to screw this up, but this is a very basic explanation from my understanding of it:
Basically, UZR breaks the playing field into 64 different zones and assigns a run value to a hit in each zone. For example if a hit to a particular part of the field often results in extra bases, it is assigned a higher run value than a part of the field that often results in singles, etc..
For each year, the system records the number of hits in each zone, the average run value of hits in each zone, and the number of outs recorded in that zone for each fielding position. (Some zones overlap between multiple positions, so outs are recorded by individual position).
For each player and each position, they then record the number of hits vs. outs recorded in each zone while that player was on the field. Some zones are on the fringes of a positions ability to get to, and plays made or not made in these zones generally tell you a lot about a particular player’s range, and also have higher run values.
The reason UZR data generally takes 3 years to stabilize, from my understanding, comes from the fact that individual players sometimes don’t experience a whole lot of balls hit to those fringey zones that really help determine his range over the course of a single year. There’s some luck involved in which zones they have a chance to make a play at in a given year and exactly where in that zone the ball was hit, and that can mean you get data that’s not accurate enough to draw conclusions from.
There’s probably more involved with it than that, but that would be my guess where the small sample sizes come into play from a single year of data. Once you get to three years, you have a much bigger number of plays made in each zone to work with for each individual player, and you can draw firmer conclusions about their ability.
by OlSalty on Feb 27, 2010 10:02 PM PST up reply actions 1 recs
Also I should say that this explanation wasn't just in reply to your post
It might be possible to (somewhat) infer defensive performance through a multitude of measures, including scouting reports, cross referencing multiple defensive metrics, etc.. that could let you arrive at a conclusion earlier than 3 years of UZR data. But using UZR alone, you can’t really, was the point I meant to make.
It's roughly equivalent to a month and a half of batting
by Graham MacAree on Feb 27, 2010 11:14 AM PST up reply actions
Which is why everyone keeps telling us to use 3 years of fielding data
Since that’s roughly equivalent to a year’s worth of batting data.
And we know that we need on the order of 500 PA (or roughly a full season) for batting stats like OPS to be reliable (and we need more than a season for BA and BABIP)
by wandergeist on Feb 27, 2010 11:55 AM PST up reply actions
Your mention of BABIP points out some of the randomness for me
Fielding doesn’t have the TTO components that batting does, and they serve to stabilize batting stats considerably. Thinking of fielding as, say, FABIP calls to mind the way a player’s BABIP will fluctuate more between seasons than overall batting stats will. Regardless, batting outcomes on balls in play still strike me as more random than fielding measurement, which looks to take account of where in the zone the ball is and how it gets there.
Can you elaborate on this?
It doesn’t seem to be the case if we’re saying a fielding opportunity equals an at bat. That 393 expected outs must come on 500 or more fielding plays. I’m not saying that should be the basis for comparison, but I’m curious what is considered the basis.
It's to do with how fast the sample stabilises
Take your data, split it up randomly (but in half). You can predict the values of one half with the values from the other half, right? The question is how well. For most useful batting stats, we get to about 75% accuracy for our split pair with a season (600PA) of data. For UZR, we need about three seasons of full-time play.
by Graham MacAree on Feb 27, 2010 9:37 PM PST up reply actions
The large plus value came in RF, not CF, but yes, that's a good point
However, irrespective of how much ‘weight’ you put on the UZR value, it would just be weird if EVERY measure interprets the data one way, and another interprets it another way. Forget sample size issues, this is a data quality issue. Either the ball’s in zone or it’s not; either you caught it, or you didn’t. We’ve seen plenty of these issues – again, there were huge differences between systems regarding Ichiro (and Grady Sizemore) in CF a few years ago.
In this case though, EVERY SYSTEM you want to look at says that Ellsbury made fewer plays than average. Even if you say that has no – zero – relation to his true-talent defensive ability, isn’t that sort of odd? How are they slicing and dicing the zones to get a different result compared to UZR+PMR+RZR+TZ? Maybe they’re doing it better, and every other metric ‘missed’ on Ellsbury this year. But wouldn’t that be an interesting result, and wouldn’t you want to know why?
I like the fact that our FO is tight-lipped.
It’s been suggested that a major reason Philadelphia came to us about trading Cliff Lee was because our front office was known not to leak information. That way the media, other teams, and Cliff Lee himself would likely never catch wind of the discussions were the talks were to break down.
Besides, there are other ways to get a window into Z’s mindset. If you come to the bi-annual USSM events, you get a lot of insight into the thought processes of Jack, Tony (wow, I just picked up on the 24 motif) and company.
No wonder we're so good
Our team is run by vintage CTU
Only in our front office people actually listen to Jack.
Everything makes sense now!
by huskies2010 on Feb 27, 2010 10:31 AM PST up reply actions
This interview
It makes me wonder how many members of the Red Sox media actually understand what Theo Epstein is doing.
You mean how the follow up questions do not actually follow up what Epstein just said?
Are there things he still needs to work on in center field? Absolutely. Reading the ball off the bat and breaking in on certain balls, that’s something he is going to get better at over the years. He’s already improved going back on balls and playing balls that are close to the wall. He’s going to make those improvements. The only downside I saw to the Cameron deal was delaying temporarily Jacoby’s development, but taking the big-picture approach, I think he’s going to end up and be a really good one and already is.
Is the biggest difference between Cameron and Ellsbury the throwing arm?
by John Morgan on Feb 26, 2010 5:58 PM PST up reply actions 5 recs
More good stuff.
We were subpar defensively last year any way you look at it. If you want to watch the team from a scouting standpoint, we had definite holes defensively that affected our pitching staff, especially on the left side of the infield with health, with Mike Lowell coming back off the surgery not able to have his normal stellar defensive performance. We had clear problems at shortstop all last year. A few too many balls were falling in the outfield as well. By the numbers, we were the third-worst defensive team in baseball last year.
What would those numbers be, because fielding percentage you were third best?
…
You look back last year and we were the third-best offense in baseball last year, we were the third-best pitching staff and we were third-worst defense. So, if there is a quick way to fix the team and get us back to balance and elite performance in all areas it was the defense. It’s not an easy thing to fix. You can’t let one guy go and bring one guy in, then you’ve just upgraded one position. With the way things turned out, it wasn’t our only goal going in, but we were happy it worked out this way, we were able to turn over four or five different positions and make us a better defensive club. We don’t think we’ve taken that much away from our offense.
You’re also talking about fixing a club that won 95 games.
I doubt that it's possible to create a much better metric than UZR given the current data
So I’m betting that if the Red Sox indeed have a much better defensive metric, they have more detailed batted ball data.
But I'd guess you're at a point of rapidly diminishing returns with regard to zone
so I’d guess it’d have to be batspeed and angle data. More than anything, I just want to know what they have.
I don’t care if Ellsbury is awesome or historically awful – if there’s a way to get a result that NO zone-based rating gets, that’s fascinating, and I’d just like to know if it comes from a hit f/x-based system, or a totally different way of carving up the zones.
Hit f/x isn't great for defense I don't think
It helps a little bit, but it’s not a ton better than what BIS and stuff offers.
I guess that the Sox could be charting the hang time, angle off bat and location of each ball manually (hell if I was some college intern for a baseball team, I would gladly spend time doing that), which would allow them to model the complete trajectory of the ball pretty accurately. They could then also gather fielding positioning before the balls is put in play, and develop a UZR type model using the fielder’s starting position and the quality of the batted ball. That would take out a bunch of the noise in UZR I think, but like you said, there may be diminishing returns.
by vivaelpujols on Feb 28, 2010 12:36 AM PST up reply actions 1 recs
I think you're right, but velocity off bat would seem to make a lot of sense
That is, if there are a lot of balls that get lumped into the ‘fly ball’ bucket that are more difficult to convert to outs, well, that would be one reason why they’d come up with a different answer here.
Things like angle off bat shouldn’t matter as much (I’d guess), because that’d manifest itself in whatever zone the ball fell into. I guess my point is, the angle and hang-time are more nuanced data than we get from UZR, but it’s velocity that really matters – that is, it’s velocity that gets us something more fine-grained than ‘fly ball’ or ‘line drive.’ Given the difference in the RVs of those batted ball types, that seems like a plausible source of the difference between ‘Carmine’ and ‘Every other defensive metric we have access to.’
That make sense
And Hit f/x data was released to the teams this year IIRC.
by vivaelpujols on Feb 28, 2010 10:41 PM PST up reply actions
Right.
My guess is (and this could be 100% wrong) that something like angle would help you fine-tune your zones (esp. if you had spin data), but you’d need to do more than that to come up with a result like they apparently came up with. It seems like it has to be a batted-ball type issue, but who knows. If it’s not and this this just a zone issue, then UZR, PMR, etc. would ALL appear to have some holes. I find this incredible, but if true, it’d obviously be an important issue.
It's also possible that Ellsbury is just one of those rare guys who gets really unlucky via UZR
A more finely tuned zone rating system could conceivably find that on a significant number of his plays this year, the expected out% was less than it appeared based on normal batted ball data zones. You’d expect that a few player’s each year will get really unlucky or lucky via UZR, and it’s possible that the same player also got unlucky with the other fielding stats as well.
That would be pretty odd though, and I tend to agree that they might have something other than a zone rating system (although I have no idea what that could be).
by vivaelpujols on Feb 28, 2010 11:11 PM PST up reply actions
Adding hang time and horizontal speed on ground balls with a stop watch would be big.
More precise location recording would help.
And start locations of the fielders would help.
Beyond the Boxscore Not a member? Sign up.
Agreed
I’m impressed with the thinking that he reveals here, whether he’s right about Ellsbury or not. I’d feel safe in predicting that they at least have different batted ball data. In the end “better” will be proven out in this example only if Carmine’s take on 2009 Ellsbury projects more accurately on his future than the other ones that coalesce at the other end of the spectrum.
On a tangent: I still struggle a bit with the idea of fielding regressing to the mean. It just feels less random than batting and pitching outcomes to me.
Was Yuniesky Betancourt
As amazing defensively as we all thought when he first got called up? Or were we misled by a small sample?
by wandergeist on Feb 27, 2010 11:58 AM PST up reply actions
Well, wasn't he a very good defender, at least to the eye?
And then just got all fat?
by vivaelpujols on Feb 27, 2010 8:45 PM PST up reply actions
He was good at making hard plays
He was still making some of those right up until he was traded.
The problem was the routine plays — the easy balls he didn’t get to, the ones he threw into the stands (remember all the throws past first?) I think the spectacular plays caused us to overlook some of the day-to-day flaws, at least at first.
by wandergeist on Feb 27, 2010 11:47 PM PST up reply actions
I wonder about that, but that's not what I saw
In Tacoma and Seattle in 2005/6. I just didn’t see as many horrible throws (and his FP was better in 2005 than it was going forward), and even his FP had been terrible, it was more than made up for with amazing range. It was his range that left him in later years, however many throwing errors he made.
That isn't what you saw because it's completely untrue
Betancourt in 2005 was something approaching perfection at shortstop.
by Graham MacAree on Feb 28, 2010 9:03 AM PST up reply actions
Okie doke. I'm trying to accomodate my own view
along with the view of a couple of MLB scouts in 2006 who thought Betancourt was basically Ozzie Smith, but maybe better.
The problem is that UZR never viewed him as an exceptionally rangy SS. At the time, we could shrug it off as a sample size problem. But with the benefit of hindsight, I start to wonder. Again, the guy I saw in Tacoma seemed like exactly what you describe: perfection at shortstop. Combined with the mediocrities in LF at the time (including Shin-Soo Choo), I swear Betancourt played SS and LF simultaneously. But while we could always shrug off his poor UZRs to sample size weirdness, his career arc gives one pause. He’s never posted a positive range rating…. ever. We all agree that the metric is describing reality now – what do we do with his meh ratings in 2005/6? I’m perfectly comfortable with the answer ‘ignore them’ because that preserves whatever validity I’d attach to my own meager scouting reports on the guy from 2005. If any conflicting data is ‘completely untrue’ then I’m more than OK with that.
You know I disagree with this.
He was good at first and then degraded to awful from my point of view. It is really hard for people to separate their view of Yuni from the fact that he followed Mike Morse in 2005 who was absolutely terrible at SS. I feel that your perspective was skewed just like it was for many others.
He was. This was apparent visually, from scouting reports, everything.
He had everything going for him…..
In one of the extremely few opportunities I’ve had to talk to MLB scouts, the subject of Yuni’s come up, and they all have seen him as the best or amongst the best defensive SS they’ve ever seen…. if they saw him in 2005.
Huh. Ok
So I guess we can just chalk it up to a fat guy in a body kept lean by rationed Cuban rice and beans, and the perils of setting loose that kind of pent up privation on an all-you-can-eat American buffet. Perhaps the Cubs can arrange to ship Silva off to play with La Habana for a while. (Heh, maybe Cuba can find a role in the new world order as America’s fat camp. The Biggest Loser enforced by AK-47s. Gitmo for fatsoes).
Anyway, I guess my point — before I picked a bad example —was that there are players who fool the eye, at least over relatively short periods of time. Jeter always looks really graceful making plays, which sometimes disguises the fact that he’s only making a spectacular-looking play because he doesn’t have the range that would allow a shortstop with more range to make it look routine. Though he seemed to improve on that lately, so I’m not sure that’s even a particularly good example.
This video is proof that just about every smart GM is dealing with this crap.
Do you think anyone at that table walked away thinking “Wow, he changed my mind on Jason Bay.”. The guy that said “VORP” I bet spent the next hour talking about the power of hustle and the magic of a David Ortiz hug.
Torjazz
Regarding UZR, which I’m sure Graham will go into in much more detail later, here is basically how it works. Each batted ball is put into a “bucket” based on it’s estimated difficulty in terms of out%. UZR draws on batted ball data from BIS to put those balls in the buckers, and that data includes information on velocity, location and angle of the ball. UZR also makes adjustments for stuff like batter hand, ballpark, etc. Once each ball is put in a bucket, you compare the estimated out% for that ball with what actually happened.
So say you have a soft line drive to shallow center field. UZR says that ball should be caught 80% of the time, but the fielder misses it. So he is -.8 plays on that play. You sum up all plays in a similar fashion and then convert it to runs. So, say, after 200 opps, the player is -15 plays. Each play is worth about .8 runs on average, so he’d have a -12 UZR.
The sources of error with UZR come from two places:
1) measurement error
2) sample size error
The first one is pretty easy to understand. Going back to the first example of the soft shallow liner to center, what if it had a lot of top spin and thus a lot less hang time. The average fielder would then maybe only catch that 20% of the time, so UZR would be unfairly debiting that player .6 runs extra on that play. Given the not-to-detailed quality of the batted ball data, there are a lot of such plays like that over the course of the season. For most players, it should even out, however, even if the sources of batted ball error aren’t biased at all, a significant number of players will have ended up getting lucky or unlucky over the course of the season. Assuming that it does follow a normal distribution, you’d expect maybe 5% of players will have been overrated/underated by as much as maybe 10 runs even over the course of a season.
The other source of error is sample size error. Basically, if a guy ends up making a diving catch on a 10% out play, it’s pretty obvious that it is something that is out of the norm and 9, maybe 8 if he is really good, times out of 10, he’ll miss that ball. It works the other way around as well. Of course, you would also expect this to even out, however, some players will just overperform their true abilities even over a full seasons (or 3 or 4). It happens with offense and pitching all of the time, why not with defense?
Of course, the problem is that those two sources of error’s may be large for some players, even over a full season, and worse they may combine so that you have a small percentage of player’s who performed far better/worse than their true talent level AND also got lucky/unlucky in terms of how UZR measured them. That’s probably the reason you see guys like Brad Hawpe (-47 UZR/150 a year ago or something), or Nyger Morgan (+40 something UZR/150 last year).
So when you see that Jacoby Ellsbury had a -18 UZR or whatever and you know that both scouts and fans think highly from him, it’s likely that A) he is a worse defender than those guys think, B) he performed below his true talent level for that season, and C) he was unlucky with his UZR score.
Of course, it’s possible that it’s all A, and he really is a -18 defender; however, that’s very unlikely for a whole host of reasons. It’s also possible that he was only B and C, and really is a very good defender who had a down year and got really unlucky. It’s also possible that he is a very good defender who had a good year and got REALLY unlucky by UZR – that’s also probably very unlikely.
To interpret UZR, you have to understand that it’s a very volatile stat for the aformentioned reasons, and you can’t take any one season of it very seriously. After 3 or 4 seasons, much of that stuff will even out for most players (unless there is bias, which is a whole other issue that I don’t want to get into now), and a player’s UZR is a much better measure of his defensive skill.
by vivaelpujols on Feb 27, 2010 9:22 PM PST reply actions 11 recs
Holy tl;dr Batman!
Let me grab a cup of coffee before I sit down to read your novel.
Is that the light at the end of the tunnel, or the headlights of an oncoming train?
You didn't like that comment? Well reasoned, clear and concise can be very enjoyable to read.
Some topics just don’t fit into sound bytes, might just be me though.
I was half-kidding
It’s a great post, once I actually sat down to read it. (Yes, I did get my cup of coffee first.)
Is that the light at the end of the tunnel, or the headlights of an oncoming train?
There were a couple comments in this post I had to read through several times.
Kind of a low key thread, but it touched on several UZR/defensive metric questions I’ve been pondering. Pretty nice.
And that's a rec from me.
You don’t get that kind of thoughtful, digestible quantity in most posts.
There is no such thing as innocence, only degrees of guilt.
Theo's verbal tic seems to be "I'll be honest" or something close.
Hard work never killed nobody, but I won't take my chances.
The ultimate way of measuring defensive efficiency is ...
… to record the game using high speed video. You then look at where a player is at the point of impact of ball and bat, and you see how much ground the player covers in various directions from his position to field the ball.
You then graph make charts showing the territory covered by a player as function of the time after impact. From that you can deduce who has quick reactions but is slow afoot, who reacts slowly but covers territory if a ball is not hit hard but is heading towards a hole, who is better coming in, etc.
I can’t believe that there are not teams who aren’t doing some of this already. Perhaps not every game, and perhaps not every player. And some of it will take awhile to build effective sample sizes. But it won’t take too many web gem types of plays to begin to establish where the boundaries are on the ground that a guy can cover as a function of time. And it will start to get easy to see that a guy is getting a web gem now for a play that he routinely made three years ago.
There must be some teams out there that are starting to do this. Or there is some stat service that is starting to provide this. The technology is easily available and the application is so obvious that it’s got to be happening. Of course, the teams that are doing this are not going to let on that they are doing this.
by Steve Nelson on Mar 1, 2010 9:45 AM PST reply actions 1 recs

by 










