Nav: Home

Flood of genome data hinders efforts to ID bacteria

October 30, 2018

HOUSTON - (Oct. 30, 2018) - There are many ways to slice and dice genomic data to identify a species of bacteria, or at least find its close relatives. But fast techniques to sequence genomes have flooded the public databases and in a biased fashion, containing lots of genomic data about some species and not enough about others, according to a Rice University computer scientist.

Todd Treangen and his colleagues tested taxonomic classification methods that match genomic sequences from bacteria of interest with those recorded in large databases to identify species. In the process, they charted a path toward improved accuracy and sensitivity.

Treangen is senior author of a study published this month in Genome Biology that demonstrates how changes over time in a widely used federal database, the National Center for Biotechnology Information's RefSeq, have influenced the accuracy of metagenomic classification methods.

A primary concern for Treangen, an expert in metagenomics -- the study of genetic material from environmental samples -- is maintaining the ability to quickly identify bacteria that pose a threat to public health.

Big data is uniquely positioned to do this -- but there's so much of it. At present, he said, low-cost and high-throughput DNA shotgun sequencing machines, which read short DNA sequences from collections of microorganisms, have resulted in the doubling of genomic data in RefSeq every two to three years.

"I initially thought more data is always better for these methods," said Treangen, who joined Rice this year from the University of Maryland Institute for Advanced Computer Studies. "You would expect that there would be no penalty, because database growth is good." However, the researchers found that bacterial data in RefSeq has an outsized effect at the species level of the taxonomic hierarchy, which is growing at a breakneck pace.

That's a problem for researchers who combine two common techniques to identify what they find. One is called k-mer-based classification, which identifies short DNA sequences from all the organisms in a bacterial sample via exact matches.

"Most of the methods that have made the problem computationally feasible rely on k-mers, which are exact matches of length 'k,' or a key in to the microbes contained in the database," he said. "If a sequenced read perfectly matches something in the database, the intuition is that you can say what that is with great precision and shortcut more expensive computational approaches."

A commonly used technique with k-mer-based classification is lowest common ancestor (LCA) assignment, he said. LCA compares samples to sequences that share a match, assigning them if necessary to a higher level in the taxonomy, such as a genus rather than a species. But this may not be specific enough for researchers trying to pin down a pathogen, he said.

In fact, the study found a k-mer-based classification tool called Bracken, which uses Bayesian statistics to infer the best match for a sequence, helped mitigate the imbalance. Even so, it struggled to identify genomes with close relatives, but not perfect matches, in the database.

Treangen said well-funded research into particular pathogens is a necessity and has greatly aided rapid-outbreak detection and tracking, but it ultimately biases public databases like RefSeq.

"For instance, there's an immense bias toward foodborne pathogens," he said. "Society wants to know a lot about Salmonella, and rightfully so. The FDA, and specifically GenomeTrakr, have aided in the sequencing of thousands of relevant pathogens and have added them directly to the reference database."

However, he said that skews the reference database toward particular genera and families of microbes in a way that affects the accuracy and sensitivity of fast taxonomic-classification tools like Kraken that use k-mer and LCA-based approaches.

Treangen said the best recent example of a false positive identification is a study that initially reported evidence of anthrax bacteria in New York City's subways. The study, based on sequenced genomes from samples, was later revised to reflect mismatches that falsely identified the sequences as Bacillus anthracis.

While a focus on public health is a key priority, Treangen said novel techniques able to cope with database growth and noise, coupled with an increased breadth of sequenced genomes, is needed for continued improvements in the field. "For example, microorganisms from the soil and ocean are severely under-sampled," he said. "There remain a lot of microbes that we need to continue to sequence to better fill out public databases, and that will ultimately help our ability to accurately classify microbes from complex samples."
-end-
Daniel Nasko, a postdoctoral researcher at the University of Maryland, is lead author of the study. Co-authors are staff scientist Sergey Koren and investigator Adam Phillippy of the National Human Genome Research Institute, Bethesda, Md. Treangen is an assistant professor of computer science.

The research was supported by the Division of Intramural Research of the National Human Genome Research Institute, part of the National Institutes of Health, and the Intelligence Advanced Research Projects Activity via the Army Research Office.

Read the abstract at https://genomebiology.biomedcentral.com/articles/10.1186/s13059-018-1554-6

This news release can be found online at http://news.rice.edu/2018/10/30/flood-of-genome-data-hinders-efforts-to-id-bacteria/

Follow Rice News and Media Relations via Twitter @RiceUNews.

Related materials:

Todd Treangen: https://sites.google.com/view/treangen/home

Rice Department of Computer Science: https://csweb.rice.edu

George R. Brown School of Engineering: https://engineering.rice.edu

Image for download:

http://news.rice.edu/files/2018/10/1102_ANCESTOR-1-web-1sx65j2.jpg

A study led by Rice University computer scientist Todd Treangen demonstrates that recent growth in genomic databases has a negative effect on attempts to identify microbes from metagenomic samples. (Credit: Courtesy of Todd Treangen)

Located on a 300-acre forested campus in Houston, Rice University is consistently ranked among the nation's top 20 universities by U.S. News & World Report. Rice has highly respected schools of Architecture, Business, Continuing Studies, Engineering, Humanities, Music, Natural Sciences and Social Sciences and is home to the Baker Institute for Public Policy. With 3,970 undergraduates and 2,934 graduate students, Rice's undergraduate student-to-faculty ratio is just under 6-to-1. Its residential college system builds close-knit communities and lifelong friendships, just one reason why Rice is ranked No. 1 for lots of race/class interaction and No. 2 for quality of life by the Princeton Review. Rice is also rated as a best value among private universities by Kiplinger's Personal Finance. To read "What they're saying about Rice," go to http://tinyurl.com/RiceUniversityoverview. David Ruth 713-348-6327 david@rice.edu

Mike Williams 713-348-6728 mikewilliams@rice.edu

Rice University

Related Bacteria Articles:

Bacteria might help other bacteria to tolerate antibiotics better
A new paper by the Dynamical Systems Biology lab at UPF shows that the response by bacteria to antibiotics may depend on other species of bacteria they live with, in such a way that some bacteria may make others more tolerant to antibiotics.
Two-faced bacteria
The gut microbiome, which is a collection of numerous beneficial bacteria species, is key to our overall well-being and good health.
Microcensus in bacteria
Bacillus subtilis can determine proportions of different groups within a mixed population.
Right beneath the skin we all have the same bacteria
In the dermis skin layer, the same bacteria are found across age and gender.
Bacteria must be 'stressed out' to divide
Bacterial cell division is controlled by both enzymatic activity and mechanical forces, which work together to control its timing and location, a new study from EPFL finds.
How bees live with bacteria
More than 90 percent of all bee species are not organized in colonies, but fight their way through life alone.
The bacteria building your baby
Australian researchers have laid to rest a longstanding controversy: is the womb sterile?
Hopping bacteria
Scientists have long known that key models of bacterial movement in real-world conditions are flawed.
Bacteria uses viral weapon against other bacteria
Bacterial cells use both a virus -- traditionally thought to be an enemy -- and a prehistoric viral protein to kill other bacteria that competes with it for food according to an international team of researchers who believe this has potential implications for future infectious disease treatment.
Drug diversity in bacteria
Bacteria produce a cocktail of various bioactive natural products in order to survive in hostile environments with competing (micro)organisms.
More Bacteria News and Bacteria Current Events

Trending Science News

Current Coronavirus (COVID-19) News

Top Science Podcasts

We have hand picked the top science podcasts of 2020.
Now Playing: TED Radio Hour

Climate Mindset
In the past few months, human beings have come together to fight a global threat. This hour, TED speakers explore how our response can be the catalyst to fight another global crisis: climate change. Guests include political strategist Tom Rivett-Carnac, diplomat Christiana Figueres, climate justice activist Xiye Bastida, and writer, illustrator, and artist Oliver Jeffers.
Now Playing: Science for the People

#562 Superbug to Bedside
By now we're all good and scared about antibiotic resistance, one of the many things coming to get us all. But there's good news, sort of. News antibiotics are coming out! How do they get tested? What does that kind of a trial look like and how does it happen? Host Bethany Brookeshire talks with Matt McCarthy, author of "Superbugs: The Race to Stop an Epidemic", about the ins and outs of testing a new antibiotic in the hospital.
Now Playing: Radiolab

Speedy Beet
There are few musical moments more well-worn than the first four notes of Beethoven's Fifth Symphony. But in this short, we find out that Beethoven might have made a last-ditch effort to keep his music from ever feeling familiar, to keep pushing his listeners to a kind of psychological limit. Big thanks to our Brooklyn Philharmonic musicians: Deborah Buck and Suzy Perelman on violin, Arash Amini on cello, and Ah Ling Neu on viola. And check out The First Four Notes, Matthew Guerrieri's book on Beethoven's Fifth. Support Radiolab today at Radiolab.org/donate.