clock menu more-arrow no yes

Filed under:

Exploring on-base percentage and low run-scoring environments

New, 34 comments

Just how important is on-base percentage for run scoring, league-wide?

Jennifer Nicholson-USA TODAY Sports

Last week, I took a look at on-base percentage and the recent historical performance of the Mariners. The results were pretty interesting. Since 2000, the first full year the Mariners played in Safeco Field, the correlation between their team OBP and the amount of runs they score is almost one-to-one. The r-squared value for the data set was .92, which means that 92% of the variability between the two statistics is accounted for.

As the staff was discussing this post, Colin asked a very good question, "How important is OBP in other low run-scoring environments?" In other words, is OBP’s importance unique to the Mariners and Safeco Field or is it’s a common phenomenon in other pitcher’s parks? There were also some questions raised about the correlation between slugging percentage and run scoring in the comments of that post. I’m going to revisit the data I pulled to try and address both of these questions.

I’ll start with Colin’s question first since that’s where I started my research. To determine each ballpark’s scoring environment, I used Baseball-Reference’s park factors instead of FanGraph’s because B-Ref uses a three-year model to reduce some of the noise in the data. Park adjustments are an inexact science but I needed some way to sort the low and high scoring environments.

Then, I calculated the r-squared value for both OBP and SLG for every team in baseball since 2000. I was mindful of teams who opened new stadiums in the past decade and a half, so teams like the Marlins and Twins have a very limited data set.

Team

Park Factor

OBP r^2

SLG r^2

Padres

93

0.73924

0.883246

Angels

94

0.861497

0.689995

Mariners

94

0.9221

0.853736

Mets

95

0.564267

0.71465

Braves

95

0.775587

0.948184

Dodgers

96

0.788571

0.820937

White Sox

96

0.904778

0.930802

Giants

97

0.896253

0.903161

Pirates

98

0.58447

0.74668

Rays

98

0.735991

0.568412

Tigers

98

0.818815

0.925863

Marlins

98

0.994155

0.849766

Yankees

98

0.961959

0.900396

Blue Jays

99

0.583147

0.670698

Athletics

100

0.766613

0.875064

MLB Average

100

0.68145

0.765199

Astros

100

0.868733

0.945677

Phillies

100

0.942475

0.94134

Cubs

100

0.895096

0.691042

Reds

100

0.923074

0.867839

Brewers

101

0.540899

0.916055

Cardinals

102

0.729568

0.714335

Royals

103

0.768896

0.704644

Twins

103

0.658838

0.807595

Diamondbacks

103

0.83033

0.770079

Orioles

103

0.602267

0.539909

Nationals

104

0.365224

0.806918

Rangers

105

0.584272

0.892815

Indians

106

0.888109

0.912933

Red Sox

106

0.811045

0.907689

Rockies

118

0.845351

0.628166

I’ll be honest, the data was all over the place and it wasn’t what I was expecting. Not only was the data scattershot, I found an interesting trend throughout Major League Baseball: over the last five years, league wide, the correlation between OBP and run scoring has dropped to just .5177 while the correlation between SLG and run scoring has stayed relatively stable (.7275). That’s a pretty incredible swing and reflects the shifting run scoring environment throughout baseball. As pitchers become more and more effective, the value of a home run or double has grown. With the league-wide strikeout rate at historically high levels, getting on base just isn't as valuable any more.

Environment

OBP R^2

SLG R^2

Low Scoring

0.777275

0.80598

Neutral

0.829497

0.858374

High Scoring

0.70839

0.768508

Based on the data above, I split all 30 teams into three groups based on their run-scoring environment. As you can see, there isn’t much we can glean from this subset of data either. Neutral parks have the highest correlation between run scoring and both OBP and SLG. High run-scoring environments have the largest difference between the two r-squared scores but not large enough to be significant.

I think we can safely say that there is no correlation between low run-scoring environments and the importance of OBP. Sorry, Colin. But I don’t think this exercise was a useless effort either. Part of research is making hypotheses and then discovering that they have no basis in reality. Most of the time, those failed inquiries never see the light of day, but I think this data was particularly instructive, if only to correct our assumptions about run scoring in today’s game.

Here’s a link to the data I pulled—it’s pretty messy and not very well organized but if you want to go in and play with it yourself, be my guest. Maybe you’ll find something that I couldn’t see.