We're more or less done with the highest level statistical theory, but there are some areas of baseball itself that we have to touch on before we can really begin unravel things.
Prerequisites for understanding: None
Prerequisites for derivation: Database
The Environment Changes the Game
Context matters. We know this instinctively (the crusade to have every single statistic from the steroid era discounted comes to mind), but what we don't have an intuitive grasp on is how much it matters. What's the difference between playing Coors Field and Fenway Park? How about playing in A-ball vs. the Japanese leagues, or the majors in the 1930s? This turns out to be a difficult question to answer, so let's lay out the scope of the question now and leave the solution for a later date.
There are two broad categories into which environmental effects fall. There's a difference in talent, which is much easier to explain. The Major Leagues (specifically the AL, due to the presence of the Yankees and Red Sox) feature the best baseball players in the world. Short-season rookie ball does not. Doing well in the majors is much more impressive than doing well further down the chain. Simple, right? Worse talent in general means more chance for a player to look good by comparison.
There's also an environmental effect caused by changes to the game of baseball itself. Baseball does not play equally between stadia, leagues, or even months. The Japanese Leagues (NPB) are, for some reason, much easier to hit home runs in that we'd expect given the difference in talent between NPB and the majors. We all know about park dimensions, humidity, wind, and temperature playing a role in what happens on the field, and that Petco Park in San Diego is going to play radically differently to the Ballpark in Arlington. What's often less clear is how the game changes year by year, and it's important to note that there's often a big difference between statistics from one era and the next (I should say that there's a big difference in the correct interpretation of the statistics).
The effect of the environment on actual play means that we have a hard job unpacking talent level and value from the raw statistics alone. Thus there's a big need for league factors, park factors, and a good understanding of the environment as a whole before we draw any conclusions about player value. The derivation of park and league factors is, frankly, a huge pain that we don't need to go into. The take-away point from this post should be that environmental effects are something we have to try to strip out when we're looking at raw data. Without taking the environment into account, you assign a lot of credit or debit to a player who doesn't deserve it.
The run/win conversion; hitting, fielding, and pitching metrics.