/cdn.vox-cdn.com/uploads/chorus_image/image/48128893/usa-today-8838843.0.jpg)
Last week, I took a look at on-base percentage and the recent historical performance of the Mariners. The results were pretty interesting. Since 2000, the first full year the Mariners played in Safeco Field, the correlation between their team OBP and the amount of runs they score is almost one-to-one. The r-squared value for the data set was .92, which means that 92% of the variability between the two statistics is accounted for.
As the staff was discussing this post, Colin asked a very good question, "How important is OBP in other low run-scoring environments?" In other words, is OBP’s importance unique to the Mariners and Safeco Field or is it’s a common phenomenon in other pitcher’s parks? There were also some questions raised about the correlation between slugging percentage and run scoring in the comments of that post. I’m going to revisit the data I pulled to try and address both of these questions.
I’ll start with Colin’s question first since that’s where I started my research. To determine each ballpark’s scoring environment, I used Baseball-Reference’s park factors instead of FanGraph’s because B-Ref uses a three-year model to reduce some of the noise in the data. Park adjustments are an inexact science but I needed some way to sort the low and high scoring environments.
Then, I calculated the r-squared value for both OBP and SLG for every team in baseball since 2000. I was mindful of teams who opened new stadiums in the past decade and a half, so teams like the Marlins and Twins have a very limited data set.
Team |
Park Factor |
OBP r^2 |
SLG r^2 |
93 |
0.73924 |
0.883246 |
|
94 |
0.861497 |
0.689995 |
|
Mariners |
94 |
0.9221 |
0.853736 |
95 |
0.564267 |
0.71465 |
|
95 |
0.775587 |
0.948184 |
|
96 |
0.788571 |
0.820937 |
|
96 |
0.904778 |
0.930802 |
|
97 |
0.896253 |
0.903161 |
|
98 |
0.58447 |
0.74668 |
|
98 |
0.735991 |
0.568412 |
|
98 |
0.818815 |
0.925863 |
|
Marlins |
98 |
0.994155 |
0.849766 |
98 |
0.961959 |
0.900396 |
|
99 |
0.583147 |
0.670698 |
|
100 |
0.766613 |
0.875064 |
|
MLB Average |
100 |
0.68145 |
0.765199 |
100 |
0.868733 |
0.945677 |
|
100 |
0.942475 |
0.94134 |
|
100 |
0.895096 |
0.691042 |
|
100 |
0.923074 |
0.867839 |
|
101 |
0.540899 |
0.916055 |
|
102 |
0.729568 |
0.714335 |
|
103 |
0.768896 |
0.704644 |
|
Twins |
103 |
0.658838 |
0.807595 |
103 |
0.83033 |
0.770079 |
|
103 |
0.602267 |
0.539909 |
|
104 |
0.365224 |
0.806918 |
|
105 |
0.584272 |
0.892815 |
|
106 |
0.888109 |
0.912933 |
|
106 |
0.811045 |
0.907689 |
|
118 |
0.845351 |
0.628166 |
I’ll be honest, the data was all over the place and it wasn’t what I was expecting. Not only was the data scattershot, I found an interesting trend throughout Major League Baseball: over the last five years, league wide, the correlation between OBP and run scoring has dropped to just .5177 while the correlation between SLG and run scoring has stayed relatively stable (.7275). That’s a pretty incredible swing and reflects the shifting run scoring environment throughout baseball. As pitchers become more and more effective, the value of a home run or double has grown. With the league-wide strikeout rate at historically high levels, getting on base just isn't as valuable any more.
Environment |
OBP R^2 |
SLG R^2 |
Low Scoring |
0.777275 |
0.80598 |
Neutral |
0.829497 |
0.858374 |
High Scoring |
0.70839 |
0.768508 |
Based on the data above, I split all 30 teams into three groups based on their run-scoring environment. As you can see, there isn’t much we can glean from this subset of data either. Neutral parks have the highest correlation between run scoring and both OBP and SLG. High run-scoring environments have the largest difference between the two r-squared scores but not large enough to be significant.
I think we can safely say that there is no correlation between low run-scoring environments and the importance of OBP. Sorry, Colin. But I don’t think this exercise was a useless effort either. Part of research is making hypotheses and then discovering that they have no basis in reality. Most of the time, those failed inquiries never see the light of day, but I think this data was particularly instructive, if only to correct our assumptions about run scoring in today’s game.
Here’s a link to the data I pulled—it’s pretty messy and not very well organized but if you want to go in and play with it yourself, be my guest. Maybe you’ll find something that I couldn’t see.