Nav: Home

Why keep the raw data?

December 07, 2016

The increasingly popular subject of raw diffraction data deposition is examined in a Topical Review in IUCrJ [Kroon-Batenburg, Helliwell, McMahon & Terwilliger (2017). IUCrJ, 4, doi:10.1107/S2052252516018315]. Building on the 2015 workshop organised by the IUCr Diffraction Data Deposition Working Group (DDDWG), the authors bring the story up to date with accounts of new subject-specific and institutional data repositories, and of growing policy pressures on research data management such as the European Open Science initiative.

The article is, however, more than just a workshop report or a survey of evolving policy. It seeks to inform the cost-benefit arguments over diffraction data deposition with examples from real front-line research. For example, Kroon-Batenburg and Helliwell have collaborated on studies of protein binding of the chemotherapeutic agent cisplatin, and have made all their 34 raw data sets available through the University of Manchester Data Library. Some of these datasets have been reanalysed and resulted in fresh understanding of cisplatin-lysozyme models.

The prospect of extracting further information from archived primary data sets in this way (either by the insights of fresh pairs of eyes or through subsequent improvements in software analysis) has implications for structural databases, facilitating the idea of continuous improvement of studies, such as for macromolecular structure models (long championed by Terwilliger).

It is not only in the field of macromolecular structure determination that these considerations are important. One of the greatest challenges to reusing any raw data is the need for complete metadata associated with any raw data set, to allow its subsequent interpretation and full evaluation.

Various IUCr Commissions are actively publishing their summaries of the essential metadata that need to be captured alongside all experimental data sets. These initiatives and their relationship to the IUCr's standard for data characterization (CIF, the Crystallographic Information Framework) are reviewed within the article. Again, practical pointers are given to essential metadata that need to be captured alongside diffraction data sets.

While there are encouraging signs that the scientific community is taking more informed interest in data management and its scientific potential, fresh challenges are being thrown up by the latest generation of instrumentation, capable of generating vast amounts of data at an incredible rate. It may not be possible to archive or even thoroughly analyse all the data that is being produced. However, this article will help to supply a deep understanding of the reasons why society should invest effort and resources into extracting the greatest value possible from the data deluge, in crystallography as in any science.

International Union of Crystallography

Related Data Articles:

Discrimination, lack of diversity, & societal risks of data mining highlighted in big data
A special issue of Big Data presents a series of insightful articles that focus on Big Data and Social and Technical Trade-Offs.
Journal AAS publishes first data description paper: Data collection and sharing
AAS published its first data description paper on June 8, 2017.
73 percent of academics say access to research data helps them in their work; 34 percent do not publish their data
Combining results from bibliometric analyses, a global sample of researcher opinions and case-study interviews, a new report reveals that although the benefits of open research data are well known, in practice, confusion remains within the researcher community around when and how to share research data.
Designing new materials from 'small' data
A Northwestern and Los Alamos team developed a novel workflow combining machine learning and density functional theory calculations to create design guidelines for new materials that exhibit useful electronic properties, such as ferroelectricity and piezoelectricity.
Big data for the universe
Astronomers at Lomonosov Moscow State University in cooperation with their French colleagues and with the help of citizen scientists have released 'The Reference Catalog of galaxy SEDs,' which contains value-added information about 800,000 galaxies.
What to do with the data?
Rapid advances in computing constantly translate into new technologies in our everyday lives.
Why keep the raw data?
The increasingly popular subject of raw diffraction data deposition is examined in a Topical Review in IUCrJ.
Infrastructure data for everyone
How much electricity flows through the grid? When and where?
Finding patterns in corrupted data
A new 'robust' statistical method from MIT enables efficient model fitting with corrupted, high-dimensional data.
Big data for little creatures
A multi-disciplinary team of researchers at UC Riverside has received $3 million from the National Science Foundation Research Traineeship program to prepare the next generation of scientists and engineers who will learn how to exploit the power of big data to understand insects.

Related Data Reading:

Storytelling with Data: A Data Visualization Guide for Business Professionals
by Cole Nussbaumer Knaflic (Author)

Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems
by Martin Kleppmann (Author)

Data Science for Business: What You Need to Know about Data Mining and Data-Analytic Thinking
by Foster Provost (Author), Tom Fawcett (Author)

R for Data Science: Import, Tidy, Transform, Visualize, and Model Data
by Hadley Wickham (Author), Garrett Grolemund (Author)

Data Visualization: A Practical Introduction
by Kieran Healy (Author)

Dear Data
by Giorgia Lupi (Author), Stefanie Posavec (Author), Maria Popova (Foreword)

Statistics: Informed Decisions Using Data (5th Edition)
by Michael Sullivan III (Author)

Everybody Lies: Big Data, New Data, and What the Internet Can Tell Us About Who We Really Are
by Seth Stephens-Davidowitz (Author)

Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython
by Wes McKinney (Author)

The Analysis of Biological Data
by Michael C. Whitlock (Author), Dolph Schluter (Author)

Best Science Podcasts 2019

We have hand picked the best science podcasts for 2019. Sit back and enjoy new science podcasts updated daily from your favorite science news services and scientists.
Now Playing: TED Radio Hour

Approaching With Kindness
We often forget to say the words "thank you." But can those two words change how you — and those around you — look at the world? This hour, TED speakers on the power of gratitude and appreciation. Guests include author AJ Jacobs, author and former baseball player Mike Robbins, Dr. Laura Trice, Professor of Management Christine Porath, and former Danish politician Özlem Cekic.
Now Playing: Science for the People

#509 Anisogamy: The Beginning of Male and Female
This week we discuss how the sperm and egg came to be, and how a difference of reproductive interest has led to sexual conflict in bed bugs. We'll be speaking with Dr. Geoff Parker, an evolutionary biologist credited with developing a theory to explain the evolution of two sexes, about anisogamy, sexual reproduction through the fusion of two different gametes: the egg and the sperm. Then we'll speak with Dr. Roberto Pereira, research scientist in urban entomology at the University of Florida, about traumatic insemination in bed bugs.