Nav: Home

Report proposes standards for sharing data and code used in computational studies

December 08, 2016

CHAMPAIGN, Ill. -- Reporting new research results involves detailed descriptions of methods and materials used in an experiment. But when a study uses computers to analyze data, create models or simulate things that can't be tested in a lab, how can other researchers see what steps were taken or potentially reproduce results?

A new report by prominent leaders in computational methods and reproducibility lays out recommendations for ways researchers, institutions, agencies and journal publishers can work together to standardize sharing of data sets and software code. The paper "Enhancing reproducibility for computational methods" appears in the journal Science.

"We have a real issue in disclosure and reporting standards for research that involves computation -- which is basically all research today," said Victoria Stodden, a University of Illinois professor of information science and the lead author of the paper. "The standards for putting enough information out there with your findings so that other researchers in the area are able to understand and potentially replicate your work came from before we used computers."

"It is becoming increasingly accepted for researchers to value open data standards as an essential part of modern scholarship, but it is nearly impossible to reproduce results from original data without the authors' code," said Marcia McNutt, the president of the National Academy of Sciences and a co-corresponding author of the study. "This policy forum makes recommendations to enable practical and useful code sharing."

Sharing complete computational methods -- data, code, parameters and the specific steps taken to arrive at the results -- is difficult for researchers because there are no standards or guides to refer to, Stodden said. It's an extra step for busy researchers to incorporate into their reporting routine, and even if someone wants to share their data or code, there are questions of how to format and document it, where to store it and how to make it accessible.

The report makes seven specific recommendations, such as documenting digital objects and making them retrievable, open licensing, placing links to datasets and workflows in scientific articles, and reproducibility checks before publication in a scholarly journal. The authors hope that disclosing computational methods will not only allow other researchers to verify and reproduce results, but also to build upon studies that have been done, such as performing different analyses with a dataset or using an established workflow with new data.

"Things like how you prepped your data -- what you did with outliers or how you normalized variables, all the things that are standard in data analysis -- can make a big impact on results," Stodden said. "Some researchers make code and data accessible on point of principle, so it's possible. But it takes time. We know it's hard, but in this report we're trying to say in a very productive and positive way that data, code and workflows need to be part of what gets disclosed as a scientific finding."
-end-
Editor's notes:

To contact Victoria Stodden, call 217-300-3173; email: vcs@illinois.edu.

The paper "Enhancing reproducibility for computational methods" is available from scipak@aaas.org.

See a video of Stodden explaining the challenges and importance of computational reporting.

University of Illinois at Urbana-Champaign

Related Data Articles:

Storing data in music
Researchers at ETH Zurich have developed a technique for embedding data in music and transmitting it to a smartphone.
Life data economics: calling for new models to assess the value of human data
After the collapse of the blockchain bubble a number of research organisations are developing platforms to enable individual ownership of life data and establish the data valuation and pricing models.
Geoscience data group urges all scientific disciplines to make data open and accessible
Institutions, science funders, data repositories, publishers, researchers and scientific societies from all scientific disciplines must work together to ensure all scientific data are easy to find, access and use, according to a new commentary in Nature by members of the Enabling FAIR Data Steering Committee.
Democratizing data science
MIT researchers are hoping to advance the democratization of data science with a new tool for nonstatisticians that automatically generates models for analyzing raw data.
Getting the most out of atmospheric data analysis
An international team including researchers from Kanazawa University used a new approach to analyze an atmospheric data set spanning 18 years for the investigation of new-particle formation.
Ecologists ask: Should we be more transparent with data?
In a new Ecological Applications article, authors Stephen M. Powers and Stephanie E.
Should you share data of threatened species?
Scientists and conservationists have continually called for location data to be turned off in wildlife photos and publications to help preserve species but new research suggests there could be more to be gained by sharing a rare find, rather than obscuring it, in certain circumstances.
Using light for next-generation data storage
Tiny, nano-sized crystals of salt encoded with data using light from a laser could be the next data storage technology of choice, following research by Australian scientists.
Futuristic data storage
The development of high-density data storage devices requires the highest possible density of elements in an array made up of individual nanomagnets.
Making data matter
The advent of 3-D printing has made it possible to take imaging data and print it into physical representations, but the process of doing so has been prohibitively time-intensive and costly.
More Data News and Data Current Events

Best Science Podcasts 2019

We have hand picked the best science podcasts for 2019. Sit back and enjoy new science podcasts updated daily from your favorite science news services and scientists.
Now Playing: TED Radio Hour

Rethinking Anger
Anger is universal and complex: it can be quiet, festering, justified, vengeful, and destructive. This hour, TED speakers explore the many sides of anger, why we need it, and who's allowed to feel it. Guests include psychologists Ryan Martin and Russell Kolts, writer Soraya Chemaly, former talk radio host Lisa Fritsch, and business professor Dan Moshavi.
Now Playing: Science for the People

#538 Nobels and Astrophysics
This week we start with this year's physics Nobel Prize awarded to Jim Peebles, Michel Mayor, and Didier Queloz and finish with a discussion of the Nobel Prizes as a way to award and highlight important science. Are they still relevant? When science breakthroughs are built on the backs of hundreds -- and sometimes thousands -- of people's hard work, how do you pick just three to highlight? Join host Rachelle Saunders and astrophysicist, author, and science communicator Ethan Siegel for their chat about astrophysics and Nobel Prizes.