Black Boxes and Stat Daemons

Jeff's note: I'm a little too...well it's Saturday and I'm disoriented, but I've been told this is another bit of required reading, so I'm giving it a bump. Go Graham go!

Famed science fiction author Arthur C. Clarke's Third Law states, "Any sufficiently advanced technology is indistinguishable from magic." Applied to the advanced baseball metrics we create, it could probably be paraphrased for the average reader (one unfamiliar with deep statistical musings), as, "Any sufficiently advanced baseball statistic is indistinguishable from a load of computer-generated bollocks."

To be fair, it is tricky to fault them for such a mentality. For most people, applying maths to baseball is neither easy nor particularly enjoyable - it takes a scientifically inclined mind to want to bother with this sort of thing. The stats crowd are, in essence, waving computer printouts and clamouring for attention in a space normally reserved for thoughts no deeper than 'Willie Bloomquist is such a gamer'. If I wasn't so fond of statistical analysis myself, I can see how this would be annoying (I used to think Willie was a good young prospect, after all). However, just because most people do not understand advanced statistical stuff doesn't mean they're dumb, or that they're irredeemable. Sometimes it's simply because we're not doing a good enough job of explaining ourselves.

To clarify with an example, one of the favourite targets of the traditional crowd is VORP (or it was a few years ago. Whatever). Why? Well, first, it sounds stupid, but that's neither here nor there. The real reason that VORP causes so much dissension is that it's actually quite difficult to understand cold. For those who aren't aware, VORP is a proprietary tool developed by Keith Woolner that essentially combines the concepts of runs created, replacement level, and positional adjustment (there's a great post on all this up on AN right now). Those are three very important ideas in modern analysis, and if you're aware of them and the basics of how they work then VORP makes some degree of intuitive sense, even if you can't see the equations behind it. If you're not, VORP looks like a box where information is fed in, processed, and spat back out. It could be the random scribblings of stats daemons living in computers for all anyone is aware. For those of you who prefer visuals (and for the sake of making this post look longer):

Black box syndrome is a huge problem. If something new is not explained clearly, concisely and transparently, chances are it's not going to be understood by a huge part of the baseball loving community. This is fine if you're not interested in reaching out to the folks who don't subscribe to sabremetric thinking, but there's a chunk of people open to new thoughts, who are completely capable of following a logical argument to its conclusion. By not providing them with any way of getting there, a black box stat is completely preventing these people from getting involved and interested in the conversation, and this simply fuels the disconnect and hostility between the two camps.

What's the solution? When doing analysis, we should spell out everything we do. Do we need the exact equations? No. Do we need to make it clear where we're making a positional adjustment, or what exactly we're talking about when you say 'replacement level'? Yes. Every single baseball stat (let's ignore Win Shares) is derived logically, and by not presenting data within a logical framework... well, the data without their scaffold are indistinguishable from complete nonsense, and when arguing against a point held dearly by the traditional lot, that's what it will be dismissed as.

There's also a need for increased information accessibility. Whenever a suitably advanced topic comes up in one of my classes at university, it's reviewed, or a reference is given as a means of learning it if one is behind. The same really should go for analytical posts. Jeff links to the win probability explanation in every game recap. Dave Cameron's article on pitching evaluation is an absolutely brilliant resource. It's a pain to have to reference things, yes, but if we're trying to educate people it's sort of on us to ensure that they have everything they need to understand what the hell is going on.

Will we win every battle? Of course not. But by doing more to make research accessible, we can only help the cause (do we have a cause?). We can and should make sabremetrics look a lot more like science, and a lot less like magic.