Massive, computer-analyzed geological database reveals chemistry of ancient ocean

March 30, 2017

MADISON, Wisconsin - A study that used a new digital library and machine reading system to suck the factual marrow from millions of geologic publications dating back decades has unraveled a longstanding mystery of ancient life: Why did easy-to-see and once-common structures called stromatolites essentially cease forming over the long arc of earth history?

Stromatolites are contorted layers of sediment formed by microbes, and they are often found in limestone and other ancient sedimentary rocks deposited beneath oceans.

"Geologists have known for a long time that stromatolites were abundant in shallow marine environments during the Precambrian, before the emergence of multi-cellular life" more than 560 million years ago, says Jon Husson, a post-doctoral researcher and co-author of a study now online in the journal Geology. "But, stromatolites are rare in the ocean today."

The new study measures the slide in stromatolite prevalence based on descriptions of rocks sifted from more than 3 million scientific publications.

"Paleontologists have largely attributed the decline in stromatolites to the evolution of animals, starting some 560 million years ago," says Shanan Peters, a professor of geoscience at University of Wisconsin-Madison and study first author. "Many multi-cellular animals, like snails, eat microbes. The evolution of these big microbe-grazing animals hit 'reset' on the stromatolite's world. Or so the story has gone."

The new study found a weak correlation between stromatolite occurrence and the diversity of animals, but a stronger link to seawater chemistry.

"The best predictor of stromatolite prevalence, both before and after the evolution of animals, is the abundance of dolomite in shallow marine sediments," says Husson. Dolomite is a high-magnesium variety of carbonate, the type of sediment that forms limestone. Dolomite is harder to make than low-magnesium carbonate and it forms today in only a narrow range of marine environments.

When the ocean water is super-saturated with carbonate, "that can make it easier for things like stromatolites to form," says Husson. "In Lake Tanganyika [Africa], there are stromatolites forming today, even though there are animals everywhere, snails and fish. The lake is super-saturated with carbonate, and it's begging to be precipitated. The microbes come along and help it to precipitate, and the result is an abundance of stromatolites." Elevated carbonate saturation can also help the formation of dolomite, thereby driving the correlation with stromatolites found in this study.

Measuring the prevalence of stromatolites through all Earth history is difficult because counting the number of stromatolites alone is not sufficient. You must also know how many rocks could potentially have stromatolites, but do not.

The big innovation of this study is the interplay of a new type of digital library and machine reading system called GeoDeepDive with a geological database called Macrostrat. Both were spearheaded by Peters at UW-Madison.

GeoDeepDive is a digital library built on high throughput computing technology that can "read" millions of papers and siphon off specific information. To date, the GeoDeepDive library contains more than 3 million scientific publications from all scientific disciplines; some 10,000 new published papers are added daily.

Macrostrat is a database describing the known geological properties of North America's upper crust, at different times and depths.

The massive computing capacity at UW-Madison's Center for High Throughput Computing and HTCondor system, the brainchild of UW-Madison computer scientist Miron Livny, powers GeoDeepDive. Combining the digital library with the geological database allowed the researchers to estimate, at different time periods, the percentage of shallow marine rocks that actually have stromatolites.

The study began in the summer of 2015, when the third author, Julia Wilcots, a Madison-native who was then an undergraduate at Princeton, asked Peters for a summer project. "In my typical fashion I gave Julia a few options," Peters says. "She picked stromatolites, so I said, 'Okay, go do it!' With minimal help from us, she developed a working application to discover and extract every mention of stromatolites from our library."

Among 10,200 papers that mentioned stromatolites, "our program was able to extract 1,013 with a name of a rock unit, which enabled us to link stromatolite occurrences to Macrostrat," says Husson.

Wilcots did not have to travel to see stromatolites, Peters says. "In Madison, we are sitting on top of rocks recording one of the biggest rises in stromatolite abundance - at least during the age of animals."

Scientists long ago observed that stromatolites started a long decline just before the start of the Cambrian era, but that decline represented a "fundamental question of paleobiology," Husson says. "Stromatolites are the oldest fossils that are visible to the naked eye. If you look at rock that is a billion years old, the chance for seeing evidence of life equals the chance of seeing stromatolites."

Beyond answering a fundamental question of Earth's history, the new study "allows us to do the kind of analyses that scientists used to only dream about, Peters says: 'If we could just compile all the published information on... anything!'

"Doing this study without GeoDeepDive would be all but impossible," Peters adds. "Reading thousands of papers to pick out references to stromatolites, and then linking them to a certain rock unit and geologic period, would take an entire career, even with Google Scholar. Here we got started with a talented undergrad working on a summer project. GeoDeepDive has greatly lowered the barrier to compiling literature data in order to answer many questions."

Another beauty of the big data, machine-reading approach is the baked-in capability for replication and improvement. "Now that this study has been done, we can run the stromatolite application again and again. We can refine the searches, and they will evaluate the new data that is being published all the time," Peters says. "So a rerun could make a better study, with minimal effort."

For centuries, "geologists have transferred hard-to-get information from the field to hard-to-get information in the literature," Peters says. "To achieve a broad-scale synthesis, you have to survey all of the published knowledge. There are new discoveries waiting in the scientific literature, if you can see the big picture and get all the data into one place."
David Tenenbaum, 608-265-8549,

University of Wisconsin-Madison

Related Data Articles from Brightsurf:

Keep the data coming
A continuous data supply ensures data-intensive simulations can run at maximum speed.

Astronomers are bulging with data
For the first time, over 250 million stars in our galaxy's bulge have been surveyed in near-ultraviolet, optical, and near-infrared light, opening the door for astronomers to reexamine key questions about the Milky Way's formation and history.

Novel method for measuring spatial dependencies turns less data into more data
Researcher makes 'little data' act big through, the application of mathematical techniques normally used for time-series, to spatial processes.

Ups and downs in COVID-19 data may be caused by data reporting practices
As data accumulates on COVID-19 cases and deaths, researchers have observed patterns of peaks and valleys that repeat on a near-weekly basis.

Data centers use less energy than you think
Using the most detailed model to date of global data center energy use, researchers found that massive efficiency gains by data centers have kept energy use roughly flat over the past decade.

Storing data in music
Researchers at ETH Zurich have developed a technique for embedding data in music and transmitting it to a smartphone.

Life data economics: calling for new models to assess the value of human data
After the collapse of the blockchain bubble a number of research organisations are developing platforms to enable individual ownership of life data and establish the data valuation and pricing models.

Geoscience data group urges all scientific disciplines to make data open and accessible
Institutions, science funders, data repositories, publishers, researchers and scientific societies from all scientific disciplines must work together to ensure all scientific data are easy to find, access and use, according to a new commentary in Nature by members of the Enabling FAIR Data Steering Committee.

Democratizing data science
MIT researchers are hoping to advance the democratization of data science with a new tool for nonstatisticians that automatically generates models for analyzing raw data.

Getting the most out of atmospheric data analysis
An international team including researchers from Kanazawa University used a new approach to analyze an atmospheric data set spanning 18 years for the investigation of new-particle formation.

Read More: Data News and Data Current Events is a participant in the Amazon Services LLC Associates Program, an affiliate advertising program designed to provide a means for sites to earn advertising fees by advertising and linking to