Baseball statistics are a powerful tool for player analysis, but quite a few conventional ones (W-L, ERA, AVG, etc) are actually fairly useless for evaluating or projecting a player. One of the more advanced websites around, Baseball Prospectus, uses a batting stat called VORP (Value Over Replacement Player, basically how many runs a major leaguer's worth over some dude in AAA whom you've never heard of and will play for peanuts). Unfortunately, I only have it in book form, so it's hard to actually use. Hrm.
And so, armed with only my trusty SQL database for the past 20 years and Excel, I went on a great undertaking to recreate the stat, which would allow me to utilise it (I needed -something- to do during the final innings of that fiasco in Baltimore on Friday, after all).
Since this is a counting stat, I used plate appearances (PA) as a baseline.
PA: At Bats(AB) + Walks(BB)+ Hit by Pitch(HBP) + Sacrifice Fly(SF) + Sacrifice Bunt(SH).
Then I decided to use a Runs Created metric to determine total player value.
A: Hits(H) + BB - Caught Stealing(CS) + HBP - Double Plays(GIDP )
B: Total Bases(TB) + (.24 * (BB - Intentional BB + HBP)) + (.62 * Stolen Bases(SB)) + (.5 * (SH + SF)) - (.03 * Strikeouts(K))
C: AB + BB + HBP + SH + SF
A little complicated, perhaps, but it's one of the most accurate models of how many runs each player contributes to his team per year, taking into account basic hitting, baserunning, etc.
So we tabulate these for every non-pitcher for 2005, break them up by position most played (an approximation I hate to make due to the fact I don't have actual play-by-play data; I'd rather have been able to split up an individual player's stats by position too), and stick them in Excel.
Then we total up all the RC for a position and divide it by PA. This gets you an average major leaguer at a certain position for a year (unsurprisingly, the more demanding defensive positions - C, SS, etc have a lower RC/PA than the outfield corners and 1st). But 'average major leaguer' isn't what we're looking to compare someone against. So I looked at an average AAA player instead. Or pretended to, anyway.
Various studies (I believe this involves another set of advanced metrics which I don't understand, and thus won't explain) show that a random AAA catcher will hit at about 85% ML average if brought up to the big leagues. That number's around 75% for the power positions - 1B, 3B, LF, RF, and 80% for the rest.
Now before we actually compare players to their theoretical replacements, we have to adjust them all to the same difficulty level. This is done by using park factors. A ballpark like Safeco Field in Seattle will suppress a player's offensive stats (think about what Ichiro!'s 2004 would have been like if he played home games elsewhere), and others, such as Coors Field up in Denver, will provide a large boost to them. This is due to a variety of conditions - altitude, air moisture, prevailing wind, foul territory, turf type, etc. So I dug up all the park factors for major league stadiums, and divided each player's RC by [(1+Park Factor)/2] to give them all an even footing. Another approximation ai had to make here as for traded players - I had no way to account for stadium switches, so if someone was traded, I just pretend they're in their original stadium the whole well, essentially screwing over Preston Wilson, who was traded from Colorado (1.104 PF) to Washington (0.941 PF) mid-season.
So then what we do is take a player's modified RC numbers, divide by PA, then subtract the replacement-level RC/PA. Once you have that, multiply by PA again to get a RCORP, which is Runs Created over Replacement Player. Say it out loud! It's really fun. ARRR-CORE!
My numbers for 2005 match VORP to within 5%, despite the rather crude approximations I had to make. I'm pretty pleased with the results.
Note that this will completely ignore defense, but that's ok, because everyone else's batting stats do as well.