Nav: Home

Alarming error common in survey analyses

July 23, 2018

It is difficult to understate the importance of survey data: They tell us who we are and--in the hands of policymakers--what to do.

It had long been apparent to Brady West, an expert on survey methodology at the University of Michigan, Ann Arbor, that the benefits of survey data coexisted with a lack of training in how to interpret them correctly, especially when it came to secondary analyses--researchers reanalyzing survey data that had been collected by a previous study.

"In my consulting work for organizations and businesses, people would come in and say, 'Well, here's my estimate of how often something occurs in a population,' such as the rate of a disease or the preferences for a political party. And they'd want to know how to interpret that. I would respond, 'Have you accounted for weighting in the survey data you're using--or, did you account for the sample design?' And I would say, probably 90 percent of the time, they'd look at me and have no idea what I was talking about. They had never learned about the fundamental principles of working with survey data in their standard Intro to Stats classes."

As a survey methodologist, West wondered whether his experience was indicative of a systemic problem. There wasn't much in the academic literature to answer the question, so he and his colleagues, Joseph Sakshaug and Guy Aurelien, sampled 250 papers, reports and presentations--all available online, all conducting secondary analyses of survey data--to see if these analytic errors were, indeed, common.

"It was quite shocking," says West. "Only about half of these analyses claimed to account for weighting, the impact of sample designs on variance estimates was widely misunderstood and there was no sign of improvement in these problems over time." But possibly worst of all, these problems were just as prevalent in the peer-reviewed literature in their sample as they were in technical reports and conference presentations. "That's what was really most shocking to me," says West. "The peer-review process was not catching these errors."

An alarming example of what can happen when you compute an estimate but ignore the survey weighting can be found in the 2010 National Survey of College Graduates (NSCG). "This is a large national survey of college graduates, and they literally say in their documentation that they're oversampling individuals with science and engineering degrees," says West. "If you take account of the weighting, which corrects for this oversampling, about 30 percent of people are getting science and engineering degrees; if you forget about the weighting, you extrapolate the oversample to the entire population, and suddenly 55 percent of people have science and engineering degrees."

Ironically, better sampling of under-studied populations may be exacerbating the problem. "There's a lot of interest in under-represented populations, such as Hispanics," says West. "So, a lot of national surveys oversample these groups and others to create a big enough sample for researchers to adequately study. But when Average Joe Researcher grabs all the data--not just the data from the subpopulation they're interested in, but everybody, whites, African Americans, and Hispanics--and then they try to analyze all that data collectively, that's when oversampling can have a horrible effect on the overall picture if that feature of the sample design is not accounted for correctly in estimation."

There are many easy-to-use software tools that can easily account for the sampling and weighting complexities associated with survey data, but the fact they are not being used speaks to the underlying problem.

"This problem originates in the fact that the people publishing these articles just aren't told about any of this in their training," says West. "We've known about the importance of survey weighting for nearly a century--but somehow how to deal with weighted survey data hasn't penetrated the statistics classes that researchers take at the undergraduate or graduate level. We spend a fortune on doing national surveys--and who knows how much misinterpreting that data is costing us."

To solve that problem, West is helping design a MOOC (massive open online course) at the University of Michigan introducing statistics with the software Python. Weighting and correct survey analyses will be taught in the very first course of that specialization. "We're really focusing on making sure that before you jump into any analyses of survey data, you have a really firm understanding of how the data were collected and where they came from."
JSM talk:

Study link:

For further details, contact: Brady West


Tel: (734) 223-9793


About JSM 2018

JSM 2018 is the largest gathering of statisticians and data scientists in the world, taking place July 28-August 2, 2018, in Vancouver. Occurring annually since 1974, JSM is a joint effort of the American Statistical Association, International Biometric Society (ENAR and WNAR), Institute of Mathematical Statistics, Statistical Society of Canada, International Chinese Statistical Association, International Indian Statistical Association, Korean International Statistical Society, International Society for Bayesian Analysis, Royal Statistical Society and International Statistical Institute. JSM activities include oral presentations, panel sessions, poster presentations, professional development courses, an exhibit hall, a career service, society and section business meetings, committee meetings, social activities and networking opportunities.

About the American Statistical Association

The ASA is the world's largest community of statisticians and the oldest continuously operating professional science society in the United States. Its members serve in industry, government and academia in more than 90 countries, advancing research and promoting sound statistical practice to inform public policy and improve human welfare. For additional information, please visit the ASA website at

American Statistical Association

Related Data Articles:

Data centers use less energy than you think
Using the most detailed model to date of global data center energy use, researchers found that massive efficiency gains by data centers have kept energy use roughly flat over the past decade.
Storing data in music
Researchers at ETH Zurich have developed a technique for embedding data in music and transmitting it to a smartphone.
Life data economics: calling for new models to assess the value of human data
After the collapse of the blockchain bubble a number of research organisations are developing platforms to enable individual ownership of life data and establish the data valuation and pricing models.
Geoscience data group urges all scientific disciplines to make data open and accessible
Institutions, science funders, data repositories, publishers, researchers and scientific societies from all scientific disciplines must work together to ensure all scientific data are easy to find, access and use, according to a new commentary in Nature by members of the Enabling FAIR Data Steering Committee.
Democratizing data science
MIT researchers are hoping to advance the democratization of data science with a new tool for nonstatisticians that automatically generates models for analyzing raw data.
Getting the most out of atmospheric data analysis
An international team including researchers from Kanazawa University used a new approach to analyze an atmospheric data set spanning 18 years for the investigation of new-particle formation.
Ecologists ask: Should we be more transparent with data?
In a new Ecological Applications article, authors Stephen M. Powers and Stephanie E.
Should you share data of threatened species?
Scientists and conservationists have continually called for location data to be turned off in wildlife photos and publications to help preserve species but new research suggests there could be more to be gained by sharing a rare find, rather than obscuring it, in certain circumstances.
Futuristic data storage
The development of high-density data storage devices requires the highest possible density of elements in an array made up of individual nanomagnets.
Making data matter
The advent of 3-D printing has made it possible to take imaging data and print it into physical representations, but the process of doing so has been prohibitively time-intensive and costly.
More Data News and Data Current Events

Trending Science News

Current Coronavirus (COVID-19) News

Top Science Podcasts

We have hand picked the top science podcasts of 2020.
Now Playing: TED Radio Hour

Listen Again: Reinvention
Change is hard, but it's also an opportunity to discover and reimagine what you thought you knew. From our economy, to music, to even ourselves–this hour TED speakers explore the power of reinvention. Guests include OK Go lead singer Damian Kulash Jr., former college gymnastics coach Valorie Kondos Field, Stockton Mayor Michael Tubbs, and entrepreneur Nick Hanauer.
Now Playing: Science for the People

#562 Superbug to Bedside
By now we're all good and scared about antibiotic resistance, one of the many things coming to get us all. But there's good news, sort of. News antibiotics are coming out! How do they get tested? What does that kind of a trial look like and how does it happen? Host Bethany Brookeshire talks with Matt McCarthy, author of "Superbugs: The Race to Stop an Epidemic", about the ins and outs of testing a new antibiotic in the hospital.
Now Playing: Radiolab

Dispatch 6: Strange Times
Covid has disrupted the most basic routines of our days and nights. But in the middle of a conversation about how to fight the virus, we find a place impervious to the stalled plans and frenetic demands of the outside world. It's a very different kind of front line, where urgent work means moving slow, and time is marked out in tiny pre-planned steps. Then, on a walk through the woods, we consider how the tempo of our lives affects our minds and discover how the beats of biology shape our bodies. This episode was produced with help from Molly Webster and Tracie Hunte. Support Radiolab today at