Nav: Home

Why did home runs surge in baseball? Statistics provides twist on hot topic

July 22, 2018

Around the middle of the 2015 season, something odd started happening in Major League Baseball (MLB): Home runs surged. They surged again in 2016, from the previous year's 4,909 to 5,610, and then again in 2017 to an all-time high of 6,105.

What was going on? For a stats-mad sport, the mystery was irresistible. There was the theory of the "Juiced Ball." Some subtle, possibly unintentional change in the manufacturing process had given balls just enough extra bounce to change history. Then there was the batter approach theory, which speculated that just a little bit more of an uppercut swing--perhaps in part due to defensive shifts--was giving the ball extra lift. Maybe batters were just cranking it as hard as they could and going for home runs given this shift to stronger defensive tactics?

And then there was a massive investigation requested by the MLB commissioner, who asked 10 scientists to find out what was going on. They tested a lot of balls and concluded it was a case of reduced drag combined with the launch angle of the ball coming off the bat.

But Jason Wilson, a statistician at Biola University in Southern California, has a different explanation. The poorer the pitch, the easier it is to whack a home run--and the quality of pitching between 2015 and 2017 had gotten worse if you broke a pitch down into measurable components and then measured pitching quality over time. Wilson called this measure "Quality of Pitch" (QOP).

The idea for measuring pitch quality began in 2010, with Jarvis Greiner, one of Wilson's students. Greiner combined an interest in statistics with being a film major and a pitcher on the college baseball team. "He had the idea that we could quantify the quality of a curve ball," says Wilson, "and for his class project, he videotaped curve balls against tape measures. The data turned out to be great, and we ended up publishing it as an academic paper. Then his father, Wayne Greiner, who works for a sports distribution company and is absolutely passionate about baseball stats, asked, 'Could this be scaled up to analyze all kinds of pitches in the MLB?' Thanks to the introduction of cameras in stadiums in 2008, we had access to tons of PITCHf/x data, and--yes--our original model did generalize quite nicely."

With Greiner senior, Wilson refined the QOP statistic. At its simplest, QOP describes how difficult a pitch would be to hit on a scale of zero to 10. "The first thing we did [was] break a pitch down into six components," says Wilson. "The first component is rise on the pitch. If there's any rise, that's a tell that it's probably a curve ball, and that counts against the quality of the pitch.

"Then there's the distance until the ball starts to break and go down. The farther out, the better. Third is the total vertical break; again, the more break, the better. Fourth is the horizontal break, and the more break horizontally, the better. We also incorporate velocity, so the faster the pitch, the better. And the final component is location, the strike zone. The corner's the best spot, the middle is bad, and if you are far outside the strike zone, well that's obviously bad, too. We combine all these into a single number, which is the QOP value."

Wilson and Greiner then began to model what happened on the field between 2016 and 2017. From the six components of the QOP, vertical break was the most important predictive variable--and it had dropped sharply. What that meant in practice was that after looking at more than 700,000 pitches per season, they found the balls were being pitched more directly than previously at the batter. They were higher in the zone; there was less variation in where they crossed.

Wilson is quick to add that with more than 700 pitchers per season, a single factor cannot explain the entire surge. But the drop in vertical break makes sense if you think about it as a way of combating the batter's upward swing--pitching higher up would make it harder to pull off a home run.

Of course, Wilson's analysis shows that if this was indeed a pitching strategy, it didn't work. QOP says Wilson can explain between two to four percent of the change in the home run number (113 to 226 home runs) based on pitching, which turns out to be 23 percent to 46 percent of the home run increase between 2016 and 2017.

The big news for 2018? Home runs are down--and if you look at the data through Wilson's model, the quality of the pitching is up.
Talk details:

The Home Run Spike of MLB 2017: Drop in Quality of Pitch (QOP) Is a Missing Factor
Tuesday, July 31, 2018
3:05-3:50 p.m.

This talk will build on research published here: For details, contact Jason Wilson.


Phone: (951) 743-2172

About JSM 2018

JSM 2018 is the largest gathering of statisticians and data scientists in the world, taking place July 28-August 2, 2018, in Vancouver. Occurring annually since 1974, JSM is a joint effort of the American Statistical Association, International Biometric Society (ENAR and WNAR), Institute of Mathematical Statistics, Statistical Society of Canada, International Chinese Statistical Association, International Indian Statistical Association, Korean International Statistical Society, International Society for Bayesian Analysis, Royal Statistical Society and International Statistical Institute. JSM activities include oral presentations, panel sessions, poster presentations, professional development courses, an exhibit hall, a career service, society and section business meetings, committee meetings, social activities and networking opportunities.

About the American Statistical Association

The ASA is the world's largest community of statisticians and the oldest continuously operating professional science society in the United States. Its members serve in industry, government and academia in more than 90 countries, advancing research and promoting sound statistical practice to inform public policy and improve human welfare. For additional information, please visit the ASA website at

American Statistical Association

Related Data Articles:

Data centers use less energy than you think
Using the most detailed model to date of global data center energy use, researchers found that massive efficiency gains by data centers have kept energy use roughly flat over the past decade.
Storing data in music
Researchers at ETH Zurich have developed a technique for embedding data in music and transmitting it to a smartphone.
Life data economics: calling for new models to assess the value of human data
After the collapse of the blockchain bubble a number of research organisations are developing platforms to enable individual ownership of life data and establish the data valuation and pricing models.
Geoscience data group urges all scientific disciplines to make data open and accessible
Institutions, science funders, data repositories, publishers, researchers and scientific societies from all scientific disciplines must work together to ensure all scientific data are easy to find, access and use, according to a new commentary in Nature by members of the Enabling FAIR Data Steering Committee.
Democratizing data science
MIT researchers are hoping to advance the democratization of data science with a new tool for nonstatisticians that automatically generates models for analyzing raw data.
Getting the most out of atmospheric data analysis
An international team including researchers from Kanazawa University used a new approach to analyze an atmospheric data set spanning 18 years for the investigation of new-particle formation.
Ecologists ask: Should we be more transparent with data?
In a new Ecological Applications article, authors Stephen M. Powers and Stephanie E.
Should you share data of threatened species?
Scientists and conservationists have continually called for location data to be turned off in wildlife photos and publications to help preserve species but new research suggests there could be more to be gained by sharing a rare find, rather than obscuring it, in certain circumstances.
Futuristic data storage
The development of high-density data storage devices requires the highest possible density of elements in an array made up of individual nanomagnets.
Making data matter
The advent of 3-D printing has made it possible to take imaging data and print it into physical representations, but the process of doing so has been prohibitively time-intensive and costly.
More Data News and Data Current Events

Trending Science News

Current Coronavirus (COVID-19) News

Top Science Podcasts

We have hand picked the top science podcasts of 2020.
Now Playing: TED Radio Hour

Making Amends
What makes a true apology? What does it mean to make amends for past mistakes? This hour, TED speakers explore how repairing the wrongs of the past is the first step toward healing for the future. Guests include historian and preservationist Brent Leggs, law professor Martha Minow, librarian Dawn Wacek, and playwright V (formerly Eve Ensler).
Now Playing: Science for the People

#566 Is Your Gut Leaking?
This week we're busting the human gut wide open with Dr. Alessio Fasano from the Center for Celiac Research and Treatment at Massachusetts General Hospital. Join host Anika Hazra for our discussion separating fact from fiction on the controversial topic of leaky gut syndrome. We cover everything from what causes a leaky gut to interpreting the results of a gut microbiome test! Related links: Center for Celiac Research and Treatment website and their YouTube channel
Now Playing: Radiolab

The Flag and the Fury
How do you actually make change in the world? For 126 years, Mississippi has had the Confederate battle flag on their state flag, and they were the last state in the nation where that emblem remained "officially" flying.  A few days ago, that flag came down. A few days before that, it coming down would have seemed impossible. We dive into the story behind this de-flagging: a journey involving a clash of histories, designs, families, and even cheerleading. This show is a collaboration with OSM Audio. Kiese Laymon's memoir Heavy is here. And the Hospitality Flag webpage is here.