I think this is the last major one I'll be doing!
Prerequisites for Understanding: The Isolation Problem, Linear Weights, Base Runs, Value, Regression, Correlation, Park Effects, Environment, Data.
Defence. The holy grail. The new front line in sabermetrics. Defence is currently baseball's most undervalued commodity. How did we get from ignoring defence entirely to fetishising it? The answer, of course, lies in the attitude of sabermetricians. The belief that not being able to measure something means that it is irrelevant is fairly pervasive amongst analysts, and frankly it's been responsible for some colossal cock-ups. Defence is the best example: for a long time it was regarded in sabermetric circles as entirely irrelevant compared to batting. The truth is that in the majors, players' defensive talent has a comparable spread to their batting talent, and we're only just starting to get to grips with this.
There's no denying that defence is very difficult to assess. The players and managers certainly don't do a spectacular job of it, as evidenced by the Gold Glove awards. We're also pretty prone to giving players too much credit for making an easy play look spectacular, and not nearly enough for making a spectacular play look easy. Fielding percentage values being immobile and having excellent hands more highly than being able to get everywhere on the field with average hands, so let's discard that as failing the smell test as a measure of overall fielding (it does have some merit in looking at hands and throwing accuracy, though). Ideally, we want to find a way to measure a defender in a way that gives him credit for making plays - which is very different to debiting him for screwing up in the middle of plays.
How do we measure such a thing? Measuring total plays made seems like a good start, but it suffers from some quite severe influences from beyond that player's control: pitching, other defenders, luck, etc. What would be better, perhaps, is a means of measuring the difficulty of every play independent of the actual defenders involved. Enter our ball in play data, generally provided by STATS or Baseball Info Systems. These tools allow us to get a good read on how hard a ball was hit as well as giving us a rough idea of trajectory. Suddenly, we have the ability to compare similar plays across the whole game. And that's a big deal.
We can derive difficulty simply by looking at how often a ball of a certain flight path is turned into an out. With enough data, sample size is no longer a concern. We can also determine how many runs a play is worth - we will, naturally, care more about a play where failure to convert will result in extra bases, as compared to one which might simply result in a foul ball. Computing the weights should be familiar to everyone right now; just pick one of linear weights or Base Runs and go with it. We now have a system in place that measures difficulty and run value of defensive plays, which means that we should have a good measurement of a player's value with the glove.
Indeed, we have a good estimate of defensive ability. Two problems stand in our way though. The first is data quality. Some plays, even in the same 'bucket', are innately harder than others, and our current data isn't granular enough to track this. We're relying on human stringers watching games on replay, and while we trust them, we shouldn't ever trust entirely in human scorers. The second is the rather large assumption that defenders all start from the same position. Clearly, this is nonsense.
Consider stolen base attempts. Say that the batter is right handed. When the runner at first breaks, what happens to the shape of the infield? The second basemen moves towards second to make a play on the runner, and a huge hole opens up on the right side. If the batter sends an easy ground ball towards second, the fielder will have absolutely no chance of converting the play, and will get punished accordingly. Good positioning (typically communicated from the bench) can have a similar effect, perhaps inflating defensive values beyond the actual skill of the player.
To top this all off, defensive appears to be fairly unstable. We all know that players at the plate have ups and downs (.300 hitters don't get .3 hits per at-bat, after all), but it appears that fielding is even streakier than batting. This introduces some weird 'errors', and as a result it is imperative to use multiple years of fielding data before even trying to make an estimate as to a player's ability. In other words, regress heavily, or you may come to some bad conclusions.
Have I mentioned that catcher defence appears to be particularly hard to quantify? And that first base defence may also not be completely included in the model above, due to scoops, etc? Or that although we park adjust, it's unclear as to how effective that is? I should probably mention those things.
Regardless, there's clearly room for improvement. Our best hope right now is an as-yet unimplemented fourth generation tool popularly called 'Field f/x'. This system tracks the ball as it moves around the diamond, and, critically, it tracks the players as well. We can see how far an outfielder's range is, or how fast a third baseman reacts to a scorcher up the line. It will allow for more accurate measurements and perhaps even optimise in-game strategy (in the form of shifts). Field f/x will someday be a very, very big deal. The downside, apart from it not actually existing yet? The fans will probably never see fresh data. The information will probably be simply too valuable to release publicly.
So, while we hold out slim hope of having the Answer dropped into our lap, we'll have to make do with what we have. It is imperative to remember the limitations of our current setup and look elsewhere in our evaluation of fielding, though. Tango surveys the fans, using a 'wisdom of crowds' setup - extremely useful as a complementary tool. Scouting reports are useful but not perfect. We even have different defensive metrics based on the same basic concept that don't agree with one another. At the end of the day, we should be making use of everything. There's information lying in there, and the more sources, the better. Just remember to convert everything into runs before you do!