Only a few left! Yay!
Prerequisites for Understanding: The Isolation Problem, Linear Weights, Base Runs, Replacement Level, The Run-Win Conversion, Value, Regression, Correlation, Park Effects, Environment, WPA and LI, Data.
Gone are the days when the triple crown defined greatness. Instead, we're faced with a dizzying array of statistics, from our old friends batting average and home runs to linear-weights based systems to some frankly impenetrable figures. The ideal, of course, is to weight what a batter does according to what it's actually worth in terms of wins and losses, so our goal is to extract real meaning from the myriad numbers at our disposal.
Usually, we would immediately begin isolating a player's contributions from that of his peers in order to get a better read on said player's actual abilities. However, there's a strong belief in many baseball circles that clutch hitting is a sustainable skill, so let's detour into some murky territory before we set off on our quest to eliminate any interference from other sources. First of all, we have a pretty handy definition of clutch in the form of Win Probability. Our putative clutch hitters will do better in high leverage situations than our average hitters will, by definition. They should also be contributing less in low-leverage situations, otherwise they're not clutch hitters so much as good hitters. So taking the differences between their actual WPA contributions and their expected contributions given their overall stat line, we have a metric for clutchness, and we can check to see how stable it is.
Short answer: It isn't.
Long answer: Clutch hitting does show up as a skill, but it is far, far, far more likely (5x or so) to be random than skill-based. It just doesn't correlate well, whether you're looking at split-season values or year-to-year. And after all, if a hitter is so good in the clutch, why doesn't he try that hard all the time?
So let's accept that we can ignore context in measuring our hitters. The clear next step is to convert events on the field to runs, using linear weights or Base Runs. I'm partial to Base Runs, myself, but linear weights is slightly easier to implement, and for the most part they give highly similar results. We now have a good idea of how many runs the average single, double, triple, or even error is worth, and we can sum up a player's hitting output in runs and compare it to average/convert to a win value. Of course, we should be remembering to park and league adjust as well.
Park-adjusted offensive statistics based on linear weights is essentially the cutting edge of evaluating hitters. But are we actually done? I would argue that we're not quite there yet. We've only looked at what happens at the end of a play. It might result in an out, or a double, or a home run, but there's another factor that comes into play between the bat striking the ball and the actual outcome of a play: the defence. In fact, we know this is a big factor - we hear about 'robbed' hits at least once a game, but the assumption is that a batter's defensive 'luck' evens out over the course of the year. There's no reason for this to be true though, so what can we do to mitigate the fact that even the most sweetly struck ball can find itself nestled in a fielder's glove?
Well, we can use the same techniques as we applied when looking at pitchers. With third generation data, we have a pretty good idea of what trajectory class any given batted ball falls into (standard caveats about the reliability of these data apply, of course), which means that applying linear weights or BsR figures based on batted balls to batters leaves you with... the wrong answer. Completely and totally. When we look at pitchers, we can safely assume that they face roughly the same calibre of batters over the course of a season, which justifies splitting information up by standard BIP data. Hitters are much, much quirkier, and they put their own stamp on things. Soe are fast. Some are slow. Some hit the ball harder than others. Some hit the ball much harder than others. The run value of a line drive, for example, has a rather extreme range when comparing Albert Pujols to Miguel Cairo. It simple doesn't make sense to use league-wide linear weights on individual batters. A better way of doing things may be to simply generate linear weights on BIP data for a batter's recent career, regress those, and apply them to his batting line. This should give us an idea of what a batter has done without defence getting overly involved. Of course, hit f/x will be helpful as well, especially on line drives and fly balls.
One element of batting statistics that I've totally neglected thus far is the scale to put our measurements on. There are lots to chose from. Batting average is familiar, as are on-base percentage and OPS. They all suffer from not really being very meaningful in terms of runs scored, but we don't have an obvious scale to turn to - R/9 is out for individuals due to the fact that they have teammates, so the question is very much up in the air.
I also haven't touched on baserunning, which is a neat topic in its own right. I probably won't be able to do it justice here, but it's one of the few times we should be including leverage index in our calculations - you can always choose how aggressive to be depending on the situation. In essence, we want to measure baserunning based on a combination of stealing and advancing on other plays. Stolen bases and times caught stealing are easy to evaluate using run modellers (in fact, certain batting statistics sometimes embed them with hits, walks, and outs), but advancement is an entirely different kettle of fish. In essence, we take the chances for a runner to advance on a single, double, flyout, or groundout, find the league average extra bases and outs generated per chance, and compare it to what our baserunner actually did. This neglects to take into account any luck which might affect the ease of advancing on a play, but it's certainly better than nothing.
In any case, these are the avenues I'd be pursuing in order to evaluate total offensive performance. It's worth bearing in mind that we're by now very very good at evaluating hitting, and improvements to our current top-of-the-line metrics (tAV/EqA, wOBA) are only going to result in very marginal advances in our understanding of talent level, so don't stress out if we never push much further into the weird world of extracting pure offensive value from the offence/defence dynamic.