FanPost

Baseruns applied to pitchers using batted ball data

First off, this is not a completely novel or creative idea. It has been tossed around here and there for a while. Graham and I had exchanged a few comments about it a while back which ended with me saying it would be cool and Graham saying he didn't have the time to do it at that time. From there I expected it to fade away and not be thought about again in the near future.

The basic idea is that Baseruns is a very good runs estimator. Some would probably consider it the best for a variety of reasons. The most important part is that can account for basically any run environment. For example, if we use it analyze a single game where one of the teams gets one hit and its a home run then Baseruns predicts 1 run instead of the 1.4 runs linear weights predicts. If you want to read more about it, I suggest looking at The Book Blog or ask if you want more details.

The major problem with Baseruns as a run estimator is that it is difficult to apply to individual players because it is a team runs estimator. A great player on an average team is a great player living in an average offensive environment. If you take that great player's stats and enter them into a Baseruns equation, you will overestimate the production of the great player because Baseruns thinks he is a great player playing in a great offensive environment. Likewise, a poor hitter will be undervalued by Baseruns. One of the ways to use Baseruns to calculate the value of a hitter is to use Baseruns to calculate the team production and then calculate the team production without the hitter on it. Linear weights is a more commonly used offensive metric because you can calculate the production of a hitter, independent of the run environment (team) they play on. Instead, linear weights assumes that the hitter plays in an average offensive environment which is fair and makes a lot of sense for a hitter.

The great thing for Baseruns is that pitchers play in a run environment determined by mostly just themselves (as well as defense, park and competition level). Felix Hernandez has a run scoring environment which is much different than Livan Hernandez and this changes the run value for different outcomes. I will be ignoring the effect of the defense or park for now although these are important. This means that we can take the stats from a pitcher's performance and input them into a Baseruns equation and it will spit out the expected runs while accurately taking into account the value for strikeouts, walks, and every hit type in that specific run environment. This is something that pretty much none of the metrics out there do (FIP, xFIP, tRA, etc). For example, Felix's home runs are less harmful than average because he allows less baserunners than average and Livan's strikeouts are worth more than average since he likely has baserunners on. Overall this is a pretty small effect because almost all pitchers have roughly similar run environments except for a few exceptional and exceptionally bad pitchers. To sum up this paragraph, pitchers performance (runs) is not linearly related to our measured variables (K, BB, etc). Baseruns takes this into account and FIP, tRA, etc don't. Since most pitchers are actually very similar with small absolute differences in abilities these non-linearities only show themselves in the extremes.

Interestingly, I stumbled across a thread over at Tango's blog where he was discussing BP's new pitching metric, SIERA. There, I found Patriot (a well known Saber/blogger) had recently done something pretty similar to what I was thinking about using some data (Comment #28 - Colin Wyers) that gave 1B, 2B, etc probabilities for batted ball types. Patriot was trying to recreate SIERA using this sort of data so his aims were a little different than what I was looking for.

I was interested was using this data to convert a player's batted ball profile (FB, GB, etc) along with BB and K rates to calculate the expected runs allowed using the Baseruns equation. The data posted by Colin Wyers gave me all the batted ball outcomes I needed. With this data I could take a pitcher that gave up 10 groundballs, 10 flyballs, 2 popups and 5 line drives and estimate how many singles, doubles, triples and home runs he would give up in an average park in front of an average defense. The next step is to take the projected number of 1B, 2B, 3B and HR along with the number of K and BB the pitcher allowed and plug all this info into the Baseruns equation. Baseruns then estimates the number of runs allowed by the pitcher based on his batted ball profile independent of park or defense. (Not completely independent but its close. More work could fix bias.)

I think this is pretty interesting and we will probably see this sort of thing pop up in some form however the differences between using Baseruns and a linear run estimator turns out to be pretty small. Graham has talked about trying to use Baseruns with tRA as a way to improve it but really the improvement would be small and its tough to motivate coding and gathering all the data to implement it but it would still be cool.

A couple small notes about my implementation. The values Colin Wyers gave were for a few years ago and appear to overpredict hits and home runs which is probably because offense has declined since then. To take care of this I just applied a fudge factor to push down the hit and home run totals to push them in line with league performance last year. The fudge factor isn't ideal but I haven't taken the time to master play-by-play databases to be able to calculate this sort of stuff for myself. I am posting the spreadsheet right here if people are interested. I'm intending on trying to post a couple of things building on this after this post but I wanted to get this out there to hear any thoughts. If we think this sort of thing is valid and works then I'll throw up a post with some more analysis.

The spreadsheet (2009 data) including leaderboards (Pitchers w/ 50+ IP) and all the data can be downloaded here (I hope you can download this). There are lots of numbers and I haven't explained too much of the details so let me know if you have questions about what is going on.