Nav: Home

Why keep the raw data?

December 07, 2016

The increasingly popular subject of raw diffraction data deposition is examined in a Topical Review in IUCrJ [Kroon-Batenburg, Helliwell, McMahon & Terwilliger (2017). IUCrJ, 4, doi:10.1107/S2052252516018315]. Building on the 2015 workshop organised by the IUCr Diffraction Data Deposition Working Group (DDDWG), the authors bring the story up to date with accounts of new subject-specific and institutional data repositories, and of growing policy pressures on research data management such as the European Open Science initiative.

The article is, however, more than just a workshop report or a survey of evolving policy. It seeks to inform the cost-benefit arguments over diffraction data deposition with examples from real front-line research. For example, Kroon-Batenburg and Helliwell have collaborated on studies of protein binding of the chemotherapeutic agent cisplatin, and have made all their 34 raw data sets available through the University of Manchester Data Library. Some of these datasets have been reanalysed and resulted in fresh understanding of cisplatin-lysozyme models.

The prospect of extracting further information from archived primary data sets in this way (either by the insights of fresh pairs of eyes or through subsequent improvements in software analysis) has implications for structural databases, facilitating the idea of continuous improvement of studies, such as for macromolecular structure models (long championed by Terwilliger).

It is not only in the field of macromolecular structure determination that these considerations are important. One of the greatest challenges to reusing any raw data is the need for complete metadata associated with any raw data set, to allow its subsequent interpretation and full evaluation.

Various IUCr Commissions are actively publishing their summaries of the essential metadata that need to be captured alongside all experimental data sets. These initiatives and their relationship to the IUCr's standard for data characterization (CIF, the Crystallographic Information Framework) are reviewed within the article. Again, practical pointers are given to essential metadata that need to be captured alongside diffraction data sets.

While there are encouraging signs that the scientific community is taking more informed interest in data management and its scientific potential, fresh challenges are being thrown up by the latest generation of instrumentation, capable of generating vast amounts of data at an incredible rate. It may not be possible to archive or even thoroughly analyse all the data that is being produced. However, this article will help to supply a deep understanding of the reasons why society should invest effort and resources into extracting the greatest value possible from the data deluge, in crystallography as in any science.
-end-


International Union of Crystallography

Related Data Articles:

Discrimination, lack of diversity, & societal risks of data mining highlighted in big data
A special issue of Big Data presents a series of insightful articles that focus on Big Data and Social and Technical Trade-Offs.
Journal AAS publishes first data description paper: Data collection and sharing
AAS published its first data description paper on June 8, 2017.
73 percent of academics say access to research data helps them in their work; 34 percent do not publish their data
Combining results from bibliometric analyses, a global sample of researcher opinions and case-study interviews, a new report reveals that although the benefits of open research data are well known, in practice, confusion remains within the researcher community around when and how to share research data.
Designing new materials from 'small' data
A Northwestern and Los Alamos team developed a novel workflow combining machine learning and density functional theory calculations to create design guidelines for new materials that exhibit useful electronic properties, such as ferroelectricity and piezoelectricity.
Big data for the universe
Astronomers at Lomonosov Moscow State University in cooperation with their French colleagues and with the help of citizen scientists have released 'The Reference Catalog of galaxy SEDs,' which contains value-added information about 800,000 galaxies.
More Data News and Data Current Events

Best Science Podcasts 2019

We have hand picked the best science podcasts for 2019. Sit back and enjoy new science podcasts updated daily from your favorite science news services and scientists.
Now Playing: TED Radio Hour

Teaching For Better Humans
More than test scores or good grades — what do kids need to prepare them for the future? This hour, guest host Manoush Zomorodi and TED speakers explore how to help children grow into better humans, in and out of the classroom. Guests include educators Olympia Della Flora and Liz Kleinrock, psychologist Thomas Curran, and writer Jacqueline Woodson.
Now Playing: Science for the People

#535 Superior
Apologies for the delay getting this week's episode out! A technical glitch slowed us down, but all is once again well. This week, we look at the often troubling intertwining of science and race: its long history, its ability to persist even during periods of disrepute, and the current forms it takes as it resurfaces, leveraging the internet and nationalism to buoy itself. We speak with Angela Saini, independent journalist and author of the new book "Superior: The Return of Race Science", about where race science went and how it's coming back.