Filed under:

# Exploring on-base percentage and low run-scoring environments

Just how important is on-base percentage for run scoring, league-wide?

Last week, I took a look at on-base percentage and the recent historical performance of the Mariners. The results were pretty interesting. Since 2000, the first full year the Mariners played in Safeco Field, the correlation between their team OBP and the amount of runs they score is almost one-to-one. The r-squared value for the data set was .92, which means that 92% of the variability between the two statistics is accounted for.

As the staff was discussing this post, Colin asked a very good question, "How important is OBP in other low run-scoring environments?" In other words, is OBP’s importance unique to the Mariners and Safeco Field or is it’s a common phenomenon in other pitcher’s parks? There were also some questions raised about the correlation between slugging percentage and run scoring in the comments of that post. I’m going to revisit the data I pulled to try and address both of these questions.

I’ll start with Colin’s question first since that’s where I started my research. To determine each ballpark’s scoring environment, I used Baseball-Reference’s park factors instead of FanGraph’s because B-Ref uses a three-year model to reduce some of the noise in the data. Park adjustments are an inexact science but I needed some way to sort the low and high scoring environments.

Then, I calculated the r-squared value for both OBP and SLG for every team in baseball since 2000. I was mindful of teams who opened new stadiums in the past decade and a half, so teams like the Marlins and Twins have a very limited data set.

 Team Park Factor OBP r^2 SLG r^2 Padres 93 0.73924 0.883246 Angels 94 0.861497 0.689995 Mariners 94 0.9221 0.853736 Mets 95 0.564267 0.71465 Braves 95 0.775587 0.948184 Dodgers 96 0.788571 0.820937 White Sox 96 0.904778 0.930802 Giants 97 0.896253 0.903161 Pirates 98 0.58447 0.74668 Rays 98 0.735991 0.568412 Tigers 98 0.818815 0.925863 Marlins 98 0.994155 0.849766 Yankees 98 0.961959 0.900396 Blue Jays 99 0.583147 0.670698 Athletics 100 0.766613 0.875064 MLB Average 100 0.68145 0.765199 Astros 100 0.868733 0.945677 Phillies 100 0.942475 0.94134 Cubs 100 0.895096 0.691042 Reds 100 0.923074 0.867839 Brewers 101 0.540899 0.916055 Cardinals 102 0.729568 0.714335 Royals 103 0.768896 0.704644 Twins 103 0.658838 0.807595 Diamondbacks 103 0.83033 0.770079 Orioles 103 0.602267 0.539909 Nationals 104 0.365224 0.806918 Rangers 105 0.584272 0.892815 Indians 106 0.888109 0.912933 Red Sox 106 0.811045 0.907689 Rockies 118 0.845351 0.628166

I’ll be honest, the data was all over the place and it wasn’t what I was expecting. Not only was the data scattershot, I found an interesting trend throughout Major League Baseball: over the last five years, league wide, the correlation between OBP and run scoring has dropped to just .5177 while the correlation between SLG and run scoring has stayed relatively stable (.7275). That’s a pretty incredible swing and reflects the shifting run scoring environment throughout baseball. As pitchers become more and more effective, the value of a home run or double has grown. With the league-wide strikeout rate at historically high levels, getting on base just isn't as valuable any more.

 Environment OBP R^2 SLG R^2 Low Scoring 0.777275 0.80598 Neutral 0.829497 0.858374 High Scoring 0.70839 0.768508

Based on the data above, I split all 30 teams into three groups based on their run-scoring environment. As you can see, there isn’t much we can glean from this subset of data either. Neutral parks have the highest correlation between run scoring and both OBP and SLG. High run-scoring environments have the largest difference between the two r-squared scores but not large enough to be significant.

I think we can safely say that there is no correlation between low run-scoring environments and the importance of OBP. Sorry, Colin. But I don’t think this exercise was a useless effort either. Part of research is making hypotheses and then discovering that they have no basis in reality. Most of the time, those failed inquiries never see the light of day, but I think this data was particularly instructive, if only to correct our assumptions about run scoring in today’s game.

Here’s a link to the data I pulled—it’s pretty messy and not very well organized but if you want to go in and play with it yourself, be my guest. Maybe you’ll find something that I couldn’t see.