Filed under:

# Fun With Numbers, Part 2: Dork Edition

There are no conclusions to be found in this post - only a metric shitload of coefficients of determination (r-squared values). I've been linear regressing my ass off for the past few days, part for fun and part to see if I could find anything interesting, and rather than try to interpret all the results I found, I've decided instead to just dump them all on you with nary an explanation of why I'm even posting this. Away we go.

A few things:

IND: Independent variable (x axis)
DEP: Dependent variable (y axis)
R^2: R-squared; strength of correlation
CNTC: Frequency with which opposing batters make contact when swinging
1st: Percentage of first-pitch strikes
StL%: Percentage of strikes looking
StS: Percentage of strikes swinging
StF: Percentage of strikes hit foul
StI: Percentage of strikes hit in play
SoC: Percentage of strikeouts that are called
adjGB%: GB% / (GB% + FB%); excludes line drives

For the first table, I looked at every pitcher in baseball with at least 100 innings (127 at the time). For the second table, I looked at every pitcher with at least 100 innings thrown in both 2006 and 2007 (94 of them at the time). This isn't the ideal way to define the sample pool, but since I was doing this manually, I kind of had to limit myself, lest I fall over dead on my laptop.

Table 1:

Table 2:

R^2 values range from 0-1, with a higher number indicating a stronger relationship. A value of ~0.1 is the approximate lower threshold of significance - in other words, the majority of the values you see above indicate no real correlation at all.