Nav: Home

Why did home runs surge in baseball? Statistics provides twist on hot topic

July 22, 2018

Around the middle of the 2015 season, something odd started happening in Major League Baseball (MLB): Home runs surged. They surged again in 2016, from the previous year's 4,909 to 5,610, and then again in 2017 to an all-time high of 6,105.

What was going on? For a stats-mad sport, the mystery was irresistible. There was the theory of the "Juiced Ball." Some subtle, possibly unintentional change in the manufacturing process had given balls just enough extra bounce to change history. Then there was the batter approach theory, which speculated that just a little bit more of an uppercut swing--perhaps in part due to defensive shifts--was giving the ball extra lift. Maybe batters were just cranking it as hard as they could and going for home runs given this shift to stronger defensive tactics?

And then there was a massive investigation requested by the MLB commissioner, who asked 10 scientists to find out what was going on. They tested a lot of balls and concluded it was a case of reduced drag combined with the launch angle of the ball coming off the bat.

But Jason Wilson, a statistician at Biola University in Southern California, has a different explanation. The poorer the pitch, the easier it is to whack a home run--and the quality of pitching between 2015 and 2017 had gotten worse if you broke a pitch down into measurable components and then measured pitching quality over time. Wilson called this measure "Quality of Pitch" (QOP).

The idea for measuring pitch quality began in 2010, with Jarvis Greiner, one of Wilson's students. Greiner combined an interest in statistics with being a film major and a pitcher on the college baseball team. "He had the idea that we could quantify the quality of a curve ball," says Wilson, "and for his class project, he videotaped curve balls against tape measures. The data turned out to be great, and we ended up publishing it as an academic paper. Then his father, Wayne Greiner, who works for a sports distribution company and is absolutely passionate about baseball stats, asked, 'Could this be scaled up to analyze all kinds of pitches in the MLB?' Thanks to the introduction of cameras in stadiums in 2008, we had access to tons of PITCHf/x data, and--yes--our original model did generalize quite nicely."

With Greiner senior, Wilson refined the QOP statistic. At its simplest, QOP describes how difficult a pitch would be to hit on a scale of zero to 10. "The first thing we did [was] break a pitch down into six components," says Wilson. "The first component is rise on the pitch. If there's any rise, that's a tell that it's probably a curve ball, and that counts against the quality of the pitch.

"Then there's the distance until the ball starts to break and go down. The farther out, the better. Third is the total vertical break; again, the more break, the better. Fourth is the horizontal break, and the more break horizontally, the better. We also incorporate velocity, so the faster the pitch, the better. And the final component is location, the strike zone. The corner's the best spot, the middle is bad, and if you are far outside the strike zone, well that's obviously bad, too. We combine all these into a single number, which is the QOP value."

Wilson and Greiner then began to model what happened on the field between 2016 and 2017. From the six components of the QOP, vertical break was the most important predictive variable--and it had dropped sharply. What that meant in practice was that after looking at more than 700,000 pitches per season, they found the balls were being pitched more directly than previously at the batter. They were higher in the zone; there was less variation in where they crossed.

Wilson is quick to add that with more than 700 pitchers per season, a single factor cannot explain the entire surge. But the drop in vertical break makes sense if you think about it as a way of combating the batter's upward swing--pitching higher up would make it harder to pull off a home run.

Of course, Wilson's analysis shows that if this was indeed a pitching strategy, it didn't work. QOP says Wilson can explain between two to four percent of the change in the home run number (113 to 226 home runs) based on pitching, which turns out to be 23 percent to 46 percent of the home run increase between 2016 and 2017.

The big news for 2018? Home runs are down--and if you look at the data through Wilson's model, the quality of the pitching is up.
-end-
Talk details:

The Home Run Spike of MLB 2017: Drop in Quality of Pitch (QOP) Is a Missing Factor
Tuesday, July 31, 2018
3:05-3:50 p.m.
http://ww2.amstat.org/meetings/jsm/2018/onlineprogram/AbstractDetails.cfm?abstractid=332642

This talk will build on research published here: https://www.fangraphs.com/tht/explaining-the-mlb-home-run-record-of-2017-with-quality-of-pitch/. For details, contact Jason Wilson.

Email: jason.wilson@biola.edu

Webpage: http://www.biola.edu/directory/people/jason-wilson
Phone: (951) 743-2172

About JSM 2018

JSM 2018 is the largest gathering of statisticians and data scientists in the world, taking place July 28-August 2, 2018, in Vancouver. Occurring annually since 1974, JSM is a joint effort of the American Statistical Association, International Biometric Society (ENAR and WNAR), Institute of Mathematical Statistics, Statistical Society of Canada, International Chinese Statistical Association, International Indian Statistical Association, Korean International Statistical Society, International Society for Bayesian Analysis, Royal Statistical Society and International Statistical Institute. JSM activities include oral presentations, panel sessions, poster presentations, professional development courses, an exhibit hall, a career service, society and section business meetings, committee meetings, social activities and networking opportunities. http://ww2.amstat.org/meetings/jsm/2018/index.cfm

About the American Statistical Association

The ASA is the world's largest community of statisticians and the oldest continuously operating professional science society in the United States. Its members serve in industry, government and academia in more than 90 countries, advancing research and promoting sound statistical practice to inform public policy and improve human welfare. For additional information, please visit the ASA website at http://www.amstat.org.

American Statistical Association

Related Data Articles:

Discrimination, lack of diversity, & societal risks of data mining highlighted in big data
A special issue of Big Data presents a series of insightful articles that focus on Big Data and Social and Technical Trade-Offs.
Journal AAS publishes first data description paper: Data collection and sharing
AAS published its first data description paper on June 8, 2017.
73 percent of academics say access to research data helps them in their work; 34 percent do not publish their data
Combining results from bibliometric analyses, a global sample of researcher opinions and case-study interviews, a new report reveals that although the benefits of open research data are well known, in practice, confusion remains within the researcher community around when and how to share research data.
Designing new materials from 'small' data
A Northwestern and Los Alamos team developed a novel workflow combining machine learning and density functional theory calculations to create design guidelines for new materials that exhibit useful electronic properties, such as ferroelectricity and piezoelectricity.
Big data for the universe
Astronomers at Lomonosov Moscow State University in cooperation with their French colleagues and with the help of citizen scientists have released 'The Reference Catalog of galaxy SEDs,' which contains value-added information about 800,000 galaxies.
What to do with the data?
Rapid advances in computing constantly translate into new technologies in our everyday lives.
Why keep the raw data?
The increasingly popular subject of raw diffraction data deposition is examined in a Topical Review in IUCrJ.
Infrastructure data for everyone
How much electricity flows through the grid? When and where?
Finding patterns in corrupted data
A new 'robust' statistical method from MIT enables efficient model fitting with corrupted, high-dimensional data.
Big data for little creatures
A multi-disciplinary team of researchers at UC Riverside has received $3 million from the National Science Foundation Research Traineeship program to prepare the next generation of scientists and engineers who will learn how to exploit the power of big data to understand insects.

Related Data Reading:

Best Science Podcasts 2019

We have hand picked the best science podcasts for 2019. Sit back and enjoy new science podcasts updated daily from your favorite science news services and scientists.
Now Playing: TED Radio Hour

Bias And Perception
How does bias distort our thinking, our listening, our beliefs... and even our search results? How can we fight it? This hour, TED speakers explore ideas about the unconscious biases that shape us. Guests include writer and broadcaster Yassmin Abdel-Magied, climatologist J. Marshall Shepherd, journalist Andreas Ekström, and experimental psychologist Tony Salvador.
Now Playing: Science for the People

#514 Arctic Energy (Rebroadcast)
This week we're looking at how alternative energy works in the arctic. We speak to Louie Azzolini and Linda Todd from the Arctic Energy Alliance, a non-profit helping communities reduce their energy usage and transition to more affordable and sustainable forms of energy. And the lessons they're learning along the way can help those of us further south.