clock menu more-arrow no yes mobile

Filed under:

Small Sample Sizes and Player Defense: OAA Might Just Be Okay

The game is evolving, and a longstanding trope of sabermetrics is beginning to fall

Los Angeles Dodgers v Seattle Mariners Photo by Abbie Parr/Getty Images

Don’t use defensive statistics in a small sample size, and even then, use them with caution.” This has been a cornerstone of the sabermetrics movement for years, and it’s been a merited statement. For some time now, though, there have been hints that front offices have much more sophisticated tools to evaluate defense than are publicly available, and over the last three years, a formidable new statistic has slowly but surely leaked into the public sphere: Baseball Savant’s Outs Above Average (OAA).

OAA is a better source for defensive evaluations (outside of an outfielder’s arm, which is not included at this stage unfortunately) because its ratings are derived from a much more clear and accurate source than the most common sabermetric defensive metrics, DRS and UZR. For outfielders, OAA takes how far an OF has to go and how much time he has to get there to put a percentage catch likelihood on a ball. Because of the way outfielders and batted balls are tracked, that catch percentage likelihood is generated relative to all comparable catch opportunities in the Statcast era.

For infielders, it understandably needs more factors—so it adds in how far the fielder is from the base where the play is to be made, as well as the batter’s average sprint speed. For infielders, “Expected” and “Actual” Catch Percentage become Expected and Actual Success Rate. For infielders OAA can actually measure arm, indirectly and as a component of a fielder’s overall OAA rating. With both infielders and outfielders, OAA breaks down by direction—so you can see that while Andrelton Simmons is very very good, what made him a little worse than Javier Baez last year was balls hit to his right, where Baez was good and Andrelton was just okay. (I would kill to see Derek Jeter’s OAA breakdown, but let’s be honest, we all know how that would go. To the right.)

This type of analysis is actually exactly what UZR attempts to do:

How does UZR determine how much credit, positive or negative, to award a fielder on each batted ball? First it goes through 6 years of batted ball data and determines how often each type and location of batted ball is fielded by each defensive position, making adjustments for the speed of the ball, and the handedness, speed, and power of the batter. Later on, further adjustments are made, such as the outs and base runners, and various park adjustments, like the size and configuration of the OF, the speed of the infield, and the speed of batted balls in general, as influenced by temperature, altitude, and the ground ball percentage of the pitcher (e.g. ground ball pitchers allow easier to field ground balls and harder to field air balls). For example, UZR might find that from 2004-2009, of all hard-hit line drives hit by a LH batter with above-average power to a certain location in an average OF, 15% are fielded by the CF’er, 10% by the LF, and 75% fall for a hit. Remember, those would be average numbers across all MLB parks.

So UZR is trying to do the same basic thing, but with a much noisier and rougher set of data. That’s why it takes so long to stabilize: you need a huge set of data to have sufficiently filtered out noise and have something reasonably reliable. Then you have to factor in that part of UZR’s generation is literally human—the adjustments for situation and batter. While well-intentioned, and an important step, it introduces further noise. Contrast that with Statcast, which is looking at the actual distance that has to be covered, the actual hang time, and the number of times balls hit with that profile that were caught or which fell for hits. The “catch percentage” you see for a play on a player’s Savant page, like this one for Kyle Lewis, is generated from a giant data set.

 Kyle Lewis Responsible Play Results for 2020
It’s also fun to watch a particular player through the year, because it gets fairly easy to tell which play you say two nights ago is which dot on the chart.

Those aren’t just given star difficulties, either: if you hover over a dot, you can see the percentage difficulty, the distance needed, and the hang time. You can also see every out or hit allowed on a responsible play for a fielder, helping you see where they’ve excelled, failed, or been inconsistent:

Kyle Lewis Range on Responsible Plays, 2020 Savant

And DRS? DRS uses Baseball Information Solutions (BIS) data, which is hand recorded. That was a valuable tool in past ages but is completely obsolete for defensive positioning in the Statcast Era, when you can go to a player’s Savant fielding page and see his positioning, or see the positioning of all CF in baseball in 2020. Not only that, 2020 is the first season where DRS is properly incorporating and calculating shifts. In 2019, over 40% of balls in play featured a shift, which obviously is a massive and glaring hole for a system to have. Even now, DRS doesn’t factor in infield positioning on pop-ups or fly balls, meaning, oh, I don’t know, a rangy first baseman is going to have defensive value that DRS simply can’t capture. While DRS has made changes to try to capture more data, it is simply an outdated way of tracking the same thing Statcast does.

No hand recording necessary: at this point we can automatically track player position by a combination of radar and cameras, a gigantic leap forward in accuracy and a system that was further upgraded for the 2020 season with new abilities (tracking player pose, for example, and tracking pitch spin directly instead of inferring it from other data.) As with any new technology, we should have an eye out for data that doesn’t make sense or other inconsistencies, but over time, Statcast’s tracking systems will be far more trustworthy than those that preceded them.

CF Positioning, 2020
I like to imagine every one of these CF were on the field at the same time

Again, if you go to Savant’s positioning pages, you can hover over every one of those dots to see who it is and how they’re positioned, down to the foot, as well as the league average player position for CF. Back on player pages, you can also see a player’s expected catch% vs. their actual catch%, telling you how much more or less they’ve caught balls and compares all sorts of things, including direction, on the OAA leaderboards (sadly we don’t have a 2020 leaderboard yet). It’s a learning curve to be able to glance at an OAA and have some sense of context for whether it’s good or bad beyond positive good, negative bad, so perusing the leaderboards as well as the pages of some good or bad players is a good way to find some neat nuggets as well as to orient yourself properly.

Keep in mind that it’s a counting stat and scale it by looking up innings played at a given position (still most easily found at FanGraphs) to get a sense of how a player’s partial season might translate to a whole season. For example, Daniel Vogelbach (RIP) has a passable career -3 OAA, which seems at least OK until you realize he’s only played about 600 inning there in the Statcast era—meaning over a full year he’s at a -9 OAA, which is miserable.

Statcast and OAA are pretty new tools and we’re still learning things about what it means and how to interpret its data, but based on what we can see of the data and how it is generated, it’s the best tool there is to measure most defensive components and is far more likely to be accurate even in small samples.

Sources/Additional Reading:

  1. OAA
  2. Catch Probability
  3. UZR Primer
  4. DRS Primer
  5. Positioning Leaderboard
  6. Kyle Lewis Fielding Page