LL Exclusive: Introducing Reformatted Batted Ball Data

one of the good ones - Otto Greule Jr

Ever wished you could get batted ball data in a format better suited for en-masse analysis than Jeff Zimmerman's leaderboards or Fangraphs' spray charts? Well...

Those of you who read the Off-Topic threads know that I'm a freshman at the Olin College of Engineering. Olin is a tiny school just west of Boston with a very unusual curriculum. There aren't really conventional lectures here; instead, classes focus on project-based learning. In other school's intro CS courses, you might attend a two-hour lecture about text scraping from the internet. In Olin's intro CS course, the professor says "you have one week to write a program that scrapes text from the internet and does something cool with it. If you need help, ask me."

I mention this because ten days ago my intro CS professor told me "you have one week to write a program that scrapes text from the internet and does something cool with it. If you need help, ask me." Having recently spent about ten years harvesting batted ball data from Baseball Heat Maps for this article on OPPO%, I knew exactly what I was going to do.

Three paragraphs is enough time spent burying the lede, don't you think?

Click here to download a .zip archive of .csv spreadsheets containing xy positions, difficulties, and outcomes of every batted ball fielded by every Fangraphs-designated center fielder in the last two years. The data is from Fangraphs' Spray Chart pages, which are wonderful but sadly don't really allow for quantitative analysis. I used a Python script that I wrote myself, which relies on some functions from Python's csv and Pattern libraries.

Now, this data isn't yet perfect. There are still some kinks in the code I need to work out. To see some of what I mean, compare Fangraphs' spray chart for Michael Saunders


Source: FanGraphs

to the one that my program outputs: Capture

Mine is missing the "impossible catches", four of the "difficult" catches are in the wrong place, and one more simply doesn't exist. Whoops. I'm going to get cracking on fixing this up soon, but until I do, it's probably best not to do any analysis that relies very heavily on the exact positions of data points in the "missed catches" columns.

That said, there's a lot of cool work that we can do here. For example, it's possible to use this data to create an extremely simplified version of John Dewan's Plus/Minus system, one of the components of DRS. The idea behind plus/minus is to credit fielders for the catches that they make that other fielders don't while penalizing them for balls they miss that other fielders catch. My system loosely replicates Plus/Minus by multiplying the number of batted balls in each of a player's difficulty categories (as defined by Fangraphs' Inside Edge data) by the average difficulty of a catch in that category. In this manner, a fielder is credited:

+.05 for a catch 90-100% of players make
+.25 for a catch 60-90% of players make
+.5 for a catch 40-60% of players make
+.75 for a catch 10-40% of players make
+.95 for a catch 0-10% of players make
-.95 for a missed catch 90-100% of players make
-.75 for a missed catch 60-90% of players make
-.5 for a missed catch 40-60% of players make
-.25 for a missed catch 10-40% of players make
-.05 for a missed catch 1-10% of players make

Run the numbers, divide by opportunities to convert to a rate stat, and what do you get?

Capture2

...an R^2 of .25 with UZR/150. OK, so that's not super amazing. But you know what? Considering all of the current issues with this data set, I will take 0.25. The %difficulty ratings, as far as I know, are entirely subjective and come from Inside Edge. The batted ball data is from MLBAM, which means it marks where a ball was fielded and not where it landed. My program doesn't include "impossible to make" catches, which is why all of the Plus/Minus ratings are skewed .045 to the right. Also, it's flat-out missing some data points. And it still gets an R^2 or .25, which, as we covered last week, means there's a moderate correlation. Not bad at all.

As I've said above, this data isn't perfect. But keep an eye on Lookout Landing: over the next few weeks, I'll be rolling out new code that'll improve data quality and expand the scope of this project beyond just center fielders. I'm excited.

In the meantime, download that .zip and play around with the numbers for a while. See if you can come to any interesting conclusions. Maybe certain center fielders play deeper than others? Maybe you can approximate range by finding the radius of a CF's fielded balls in play (or the radius of their closest miss)? There's all sorts of cool work to be done here.

I look forward to helping facilitate it.

X
Log In Sign Up

forgot?
Log In Sign Up

Please choose a new SB Nation username and password

As part of the new SB Nation launch, prior users will need to choose a permanent username, along with a new password.

Your username will be used to login to SB Nation going forward.

I already have a Vox Media account!

Verify Vox Media account

Please login to your Vox Media account. This account will be linked to your previously existing Eater account.

Please choose a new SB Nation username and password

As part of the new SB Nation launch, prior MT authors will need to choose a new username and password.

Your username will be used to login to SB Nation going forward.

Forgot password?

We'll email you a reset link.

If you signed up using a 3rd party account like Facebook or Twitter, please login with it instead.

Forgot password?

Try another email?

Almost done,

By becoming a registered user, you are also agreeing to our Terms and confirming that you have read our Privacy Policy.

Join Lookout Landing

You must be a member of Lookout Landing to participate.

We have our own Community Guidelines at Lookout Landing. You should read them.

Join Lookout Landing

You must be a member of Lookout Landing to participate.

We have our own Community Guidelines at Lookout Landing. You should read them.

Spinner.vc97ec6e

Authenticating

Great!

Choose an available username to complete sign up.

In order to provide our users with a better overall experience, we ask for more information from Facebook when using it to login so that we can learn more about our audience and provide you with the best possible experience. We do not store specific user data and the sharing of it is not required to login with Facebook.

tracking_pixel_9351_tracker