clock menu more-arrow no yes mobile

Filed under:

The Right Stuff: An Introduction to Pitch Arsenal Scores 3.0

A new way to measure a pitcher’s raw stuff.

If you buy something from an SB Nation link, Vox Media may earn a commission. See our ethics statement.

Houston Astros v Seattle Mariners Photo by Lindsey Wasson/Getty Images

Watching James Paxton strike out sixteen batters last May, a week before spinning the first no-hitter of his career, was pure joy. His fastball was unhittable and his curveball was buckling knees left and right. It was simply a masterclass in raw stuff. We often evaluate our pitching prospects by scouting their raw stuff: their fastball velocity, the sharpness of their slider, and the drop of their curveball. But as soon as those pitchers reach the major leagues, we start evaluating them based on their outcomes. Strikeout rates and walk rates and batted ball profiles become the standard language by which we gauge success.

But can we measure stuff? Can we assess the raw characteristics of the individual pitches in a pitcher’s arsenal—their velocity, movement, and command—and then compare those characteristics against their peers?

Since 2015, I’ve been including Pitch Arsenal Scores in my series previews, but they’ve been mainly based on the whiff rate and batted ball rates generated by a pitch—outcome-based evaluation. Others have conducted similar research into arsenal scores; Eno Sarris was one of the first, back in 2014, but many others have carried on his work.

This offseason, Aaron Sauceda of CBS Fantasy Sports built on that foundation with his ACES arsenal score. I had tinkered with building an arsenal score similar to this last year, one based on pitch characteristics, but I felt like it was missing a big piece of the pie. Sauceda’s ACES score solved that missing piece by including a component that measured the command of a pitch. Using the Command+ metric introduced by STATS last year, he was now able to integrate velocity, movement, and command into one metric measuring raw stuff. Command+ is a huge step forward for measuring command because it takes intent into account. It tries to answer the question, “Did the pitcher do what he wanted to do with the pitch?”

Unfortunately for us common folk, STATS’s data feed is paywalled so we don’t have easy access to use Command+. But Sauceda’s work inspired me to finish working on the revised arsenal scores I had started last year. I just had to find a command metric that was usable for our purposes. Baseball Prospectus has two command metrics they’ve developed—Called Strike Probability and CMD—but they’re not broken down by individual pitch types.

A few years ago, Bill Petti and Jeff Zimmerman introduced Edge% and Heart%. They were trying to determine how often pitchers were able to pitch on the edges of the strike zone while avoiding the heart of the plate. It’s a crude proxy for command since it doesn’t account for intent and it penalizes pitchers for throwing outside of the zone too much. But the zone data is publicly available (I’m pulling data from Baseball Savant) and it’s easily calculated.

Pitch Arsenal Scores 3.0 = Stuff+

With a command metric in hand, I was now able to finish my new Pitch Arsenal Scores (renamed Stuff+). I pulled PITCHf/x data from the Baseball Prospectus leaderboards and zone and spin rate data from Baseball Savant, with my sample including starting pitchers who threw at least 20 innings and individual pitches thrown at least 100 times. Within each pitch type, I calculated percentile ranks for each component—velocity, horizontal movement, vertical movement, spin rate, and command—and then combined them into a weighted total using the following weights and finally indexed the results to give a final value:

Velocity (45%) + Movement (25%) + Spin Rate (5%) + Command (25%) = Stuff+

Stuff+ works like wRC+ or FIP-, where 100 is average and each point above or below that is one percent better or worse than average. So a changeup that has a Stuff+ value of 118 is 18% better than the league average changeup and a slider that has a Stuff+ value of 89 is 11% worse than an average slider.

Assumptions

Velocity
I assumed that more velocity is always better for every pitch type except changeups. For changeups, I calculated the velocity differential between the pitcher’s fastball and their changeup and assumed that a greater differential was better.

Movement
There’s ample research out there showing that vertical movement is far more important to the success of a pitch than horizontal movement. In my calculations, I weighed vertical movement twice as heavily as horizontal movement. For four-seam fastballs, we know that more “rise” leads to higher effectiveness. For all other pitch types, more “drop” is the desired characteristic.

For horizontal movement, I calculated z-scores (to find the standard deviation from the average) and the took the absolute value of that score. Since pitches move differently horizontally based on the pitcher’s handedness and pitch type, we want to account for both arm-side and glove-side movement while also assuming that merely average horizontal movement was worse than more movement in either direction.

Spin Rate
Because spin rate is so highly correlated with velocity and movement already, I’ve given it a small weight on the overall value of a pitch. But I wanted to include some sort of spin component because there are pitchers who are more or less efficient with their spin. Based on the research conducted by Jonah Pemstein, we know that higher spin rates usually generate higher whiff rates. That’s not the case for generating favorable batted balls however. For our purposes, I ignored the effects spin rate has on different batted ball types and simply focused on high-spin, high whiff rate pitches.

Command
For my command component, I took the Edge% and Heart% for each pitch based on the detailed strike zone from Baseball Savant. Statcast has their edge of the strike zone straddling the rulebook edge of the zone so that half sits within the zone and half outside of the zone. The final command component is simply Edge% - Heart% to give us an estimate for how well a pitcher can locate a given pitch on the edge of the strike zone while avoiding the middle of the plate.

Overall Stuff+
Once we have Stuff+ values for each pitch type in a pitcher’s arsenal, we can calculate an overall stuff value for a pitcher by weighting each pitch by how frequently it’s thrown.

Results

I won’t spoil all the results at once (mostly because the resulting table would simply be too large). But I will give you a glimpse at the top 15 overall Stuff+ scores:

Top-15 Overall Stuff+ (2018)

Pitcher Stuff+ Four-seam Sinker Cutter Changeup Splitter Curveball Slider
Pitcher Stuff+ Four-seam Sinker Cutter Changeup Splitter Curveball Slider
Garrett Richards 159 174 149 146
Blake Snell 146 171 109 123 132
Jameson Taillon 146 155 157 98 140 133
Gerrit Cole 146 162 86 160 124
Luis Severino 146 147 125 152
Justin Verlander 145 161 134 122
Zack Wheeler 143 150 156 95 143
Jacob deGrom 143 173 119 82 118 147
Charlie Morton 138 141 149 109 140 135
Michael Fulmer 135 168 118 124 134
Nathan Eovaldi 135 140 144 132 136
Tyler Glasnow 133 127 166
Domingo German 133 125 169 98 138
James Paxton 133 146 126 103
Luis Castillo 132 117 142 172 86

There are a few surprises on this list, but for the most part aligns with what you might expect. Each of these pitchers is working with at least two or three elite pitches in their repertoire, and all of them feature an excellent fastball of some sort paired with a plus secondary offering or two.

Utility and Correlation

With Stuff+ values calculated for each pitch type and overall values calculated for each pitcher, I tested to see whether or not the new metric I created was reliable in describing a pitcher’s outcomes. There is a meaningful, if small, relationship between a pitcher’s overall Stuff+ and their FIP (r2 = .247).

The relationship is fairly similar for other ERA estimators like xFIP (r2 = .255) and SIERA (r2 = .229). While it’s not perfect, it does compare favorably to Sauceda’s ACES score. And as an ERA estimator? Its relationship compares favorably to other ERA estimators too (r2 = .168 for ERA).

Next Steps

Tomorrow, I’ll examine the leaders for each pitch type and discuss some of the limitations and shortcomings they reveal. I’ll also take a look at the Mariners pitching staff and see how their pitch arsenals stack up against their peers. Update: Read Part 2 in this series.

Feel free to comment with any questions or areas of improvement. I’d love to continue honing Stuff+ to make it as useful as it can be.

Example Usage

In the next few weeks, I’ll be using these new arsenal scores in the AL West previews that started this week and you’ll see Stuff+ regularly used in the series previews once the season starts.

Here’s an example of what you might see in a series preview:

LHP Marco Gonzales

Pitch Type Frequency Velocity Stuff+ Whiff+ BIP+
Pitch Type Frequency Velocity Stuff+ Whiff+ BIP+
Sinker 32.5% 90.7 80 135 110
Cutter 22.2% 87.6 86 91 85
Changeup 23.0% 84.2 101 76 83
Curveball 22.3% 78.6 125 79 109

For each pitch, I’ll provide a frequency, average velocity, the pitch’s Stuff+ value and two additional results-based metrics, Whiff+ and BIP+.

Whiff+

Whiff+ is simply the whiff-per-swing rate of a given pitch set against the league average whiff-per-swing rate of that pitch type. It’s important to compare within pitch type categories because breaking balls generate far higher whiff rates than fastballs. Whiff+ works like Stuff+ (and wRC+, etc.), where 100 is average and each point above or below that is one percent better or worse than average.

BIP+

BIP+ is a metric I developed that indicates how often a given pitch induces favorable batted balls (ground balls and popups) when it’s put into play. Ground balls are weighted twice as much as pop ups in my calculations. And like Whiff+, the batted ball data is set against league average for the specific pitch type and each point above or below 100 is one percent better or worse than average.


For a deeper dive in to a pitcher’s repertoire, you might see something like this table below:

Marco Gonzales - 96 Overall Stuff+

Pitch Type Stuff+ Velocity Horizontal Movement Vertical Movement Spin Rate Control
Pitch Type Stuff+ Velocity Horizontal Movement Vertical Movement Spin Rate Control
Sinker 81 26 19 12 85 84
Cutter 86 35 15 9 62 86
Changeup 101 25 92 42 96 78
Curveball 125 48 32 75 46 92

This table exposes all the components that get calculated into Stuff+: velocity, movement, spin rate, and control. These components are displayed as percentile ranks, so 100 is the highest ranked value, 50 is approximately average, and 0 is the lowest ranked value. Getting a quick glance at all four of these components should give us a pretty good idea of what makes a pitch effective. For Marco Gonzales above, it’s clear that his excellent control of all four of his pitches helps him overcome the low velocity in his repertoire.