Nav: Home

Why keep the raw data?

December 07, 2016

The increasingly popular subject of raw diffraction data deposition is examined in a Topical Review in IUCrJ [Kroon-Batenburg, Helliwell, McMahon & Terwilliger (2017). IUCrJ, 4, doi:10.1107/S2052252516018315]. Building on the 2015 workshop organised by the IUCr Diffraction Data Deposition Working Group (DDDWG), the authors bring the story up to date with accounts of new subject-specific and institutional data repositories, and of growing policy pressures on research data management such as the European Open Science initiative.

The article is, however, more than just a workshop report or a survey of evolving policy. It seeks to inform the cost-benefit arguments over diffraction data deposition with examples from real front-line research. For example, Kroon-Batenburg and Helliwell have collaborated on studies of protein binding of the chemotherapeutic agent cisplatin, and have made all their 34 raw data sets available through the University of Manchester Data Library. Some of these datasets have been reanalysed and resulted in fresh understanding of cisplatin-lysozyme models.

The prospect of extracting further information from archived primary data sets in this way (either by the insights of fresh pairs of eyes or through subsequent improvements in software analysis) has implications for structural databases, facilitating the idea of continuous improvement of studies, such as for macromolecular structure models (long championed by Terwilliger).

It is not only in the field of macromolecular structure determination that these considerations are important. One of the greatest challenges to reusing any raw data is the need for complete metadata associated with any raw data set, to allow its subsequent interpretation and full evaluation.

Various IUCr Commissions are actively publishing their summaries of the essential metadata that need to be captured alongside all experimental data sets. These initiatives and their relationship to the IUCr's standard for data characterization (CIF, the Crystallographic Information Framework) are reviewed within the article. Again, practical pointers are given to essential metadata that need to be captured alongside diffraction data sets.

While there are encouraging signs that the scientific community is taking more informed interest in data management and its scientific potential, fresh challenges are being thrown up by the latest generation of instrumentation, capable of generating vast amounts of data at an incredible rate. It may not be possible to archive or even thoroughly analyse all the data that is being produced. However, this article will help to supply a deep understanding of the reasons why society should invest effort and resources into extracting the greatest value possible from the data deluge, in crystallography as in any science.
-end-


International Union of Crystallography

Related Data Articles:

Discrimination, lack of diversity, & societal risks of data mining highlighted in big data
A special issue of Big Data presents a series of insightful articles that focus on Big Data and Social and Technical Trade-Offs.
Journal AAS publishes first data description paper: Data collection and sharing
AAS published its first data description paper on June 8, 2017.
73 percent of academics say access to research data helps them in their work; 34 percent do not publish their data
Combining results from bibliometric analyses, a global sample of researcher opinions and case-study interviews, a new report reveals that although the benefits of open research data are well known, in practice, confusion remains within the researcher community around when and how to share research data.
Designing new materials from 'small' data
A Northwestern and Los Alamos team developed a novel workflow combining machine learning and density functional theory calculations to create design guidelines for new materials that exhibit useful electronic properties, such as ferroelectricity and piezoelectricity.
Big data for the universe
Astronomers at Lomonosov Moscow State University in cooperation with their French colleagues and with the help of citizen scientists have released 'The Reference Catalog of galaxy SEDs,' which contains value-added information about 800,000 galaxies.
What to do with the data?
Rapid advances in computing constantly translate into new technologies in our everyday lives.
Why keep the raw data?
The increasingly popular subject of raw diffraction data deposition is examined in a Topical Review in IUCrJ.
Infrastructure data for everyone
How much electricity flows through the grid? When and where?
Finding patterns in corrupted data
A new 'robust' statistical method from MIT enables efficient model fitting with corrupted, high-dimensional data.
Big data for little creatures
A multi-disciplinary team of researchers at UC Riverside has received $3 million from the National Science Foundation Research Traineeship program to prepare the next generation of scientists and engineers who will learn how to exploit the power of big data to understand insects.

Related Data Reading:

Best Science Podcasts 2019

We have hand picked the best science podcasts for 2019. Sit back and enjoy new science podcasts updated daily from your favorite science news services and scientists.
Now Playing: TED Radio Hour

Moving Forward
When the life you've built slips out of your grasp, you're often told it's best to move on. But is that true? Instead of forgetting the past, TED speakers describe how we can move forward with it. Guests include writers Nora McInerny and Suleika Jaouad, and human rights advocate Lindy Lou Isonhood.
Now Playing: Science for the People

#527 Honey I CRISPR'd the Kids
This week we're coming to you from Awesome Con in Washington, D.C. There, host Bethany Brookshire led a panel of three amazing guests to talk about the promise and perils of CRISPR, and what happens now that CRISPR babies have (maybe?) been born. Featuring science writer Tina Saey, molecular biologist Anne Simon, and bioethicist Alan Regenberg. A Nobel Prize winner argues banning CRISPR babies won’t work Geneticists push for a 5-year global ban on gene-edited babies A CRISPR spin-off causes unintended typos in DNA News of the first gene-edited babies ignited a firestorm The researcher who created CRISPR twins defends...