International competition benchmarks metagenomics software

October 02, 2017

Communities of bacteria live everywhere: inside our bodies, on our bodies and all around us. The human gut alone contains hundreds of species of bacteria that help digest food and provide nutrients, but can also make us sick. To learn more about these groups of bacteria and how they impact our lives, scientists need to study them. But this task poses challenges, because taking the bacteria into the laboratory is either impossible or would disrupt the biological processes the scientists wish to study.

To bypass these difficulties, scientists have turned to the field of metagenomics. In metagenomics, researchers use algorithms to piece together DNA from an environmental sample to determine the type and role of bacteria present. Unlike established fields such as chemistry, where researchers evaluate their results against a set of known standards, metagenomics is a relatively young field that lacks such benchmarks.

Mihai Pop, a professor of computer science at the University of Maryland with a joint appointment in the University of Maryland Institute for Advanced Computer Studies, recently helped judge an international challenge called the Critical Assessment of Metagenome Interpretation (CAMI), which benchmarked metagenomics software. The results were published in the journal Nature Methods on October 2, 2017.

"There's no one algorithm that we can say is the best at everything," said Pop, who is also co-director of the Center for Health-related Informatics and Bioimaging at UMD. "What we found was that one tool does better in one context, but another does better in another context. It is important for researchers to know that they need to choose software based on the specific questions they are trying to answer."

The study's results were not surprising to Pop, because of the many challenges metagenomics software developers face. First, DNA analysis is challenging in metagenomics because the recovered DNA often comes from the field, not a tightly controlled laboratory environment. In addition, DNA from many organisms--some of which may not have known genomes--mingle together in a sample, making it difficult to correctly assemble, or piece together, individual genomes. Moreover, DNA degrades in harsh environments.

"I like to think of metagenomics as a new type of microscope," Pop said. "In the old days, you would use a microscope to study bacteria. Now we have a much more powerful microscope, which is DNA sequencing coupled with advanced algorithms. Metagenomics holds the promise of helping us understand what bacteria do in the world. But first we need to tune that microscope."

CAMI's leader invited Pop to help evaluate the submissions by challenge participants because of his expertise in genome and metagenome assembly. In 2009, Pop helped publish Bowtie, one of the most commonly used software packages for assembling genomes. More recently, he collaborated with the University of Maryland School of Medicine to analyze hundreds of thousands of gene sequences as part of the largest, most comprehensive study of childhood diarrheal diseases ever conducted in developing countries.

"We uncovered new, unknown bacteria that cause diarrheal diseases, and we also found interactions between bacteria that might worsen or improve illness," Pop said. "I feel that's one of the most impactful projects I've done using metagenomics."

For the competition, CAMI researchers combined approximately 700 microbial genomes and 600 viral genomes with other DNA sources and simulated how such a collection of DNA might appear in the field. The participants' task was to reconstruct and analyze the genomes of the simulated DNA pool.

CAMI researchers scored the participants' submissions in three areas: how well they assembled the fragmented genomes; how well they "binned," or organized, DNA fragments into related groups to determine the families of organisms in the mixture; and how well they "profiled," or reconstructed, the identity and relative abundance of the organisms present in the mixture. Pop contributed metrics and software for evaluating the submitted assembled genomes.

Nineteen teams submitted 215 entries using six genome assemblers, nine binners and 10 profilers to tackle this challenge.

The results showed that for assembly, algorithms that pieced together a genome using different lengths of smaller DNA fragments outperformed those that used DNA fragments of a fixed length. However, no assemblers did well at picking apart different, yet similar genomes.

For the binning task, the researchers found tradeoffs in how accurately the software programs identified the group to which a particular DNA fragment belonged, versus how many DNA fragments the software assigned to any groups. This result suggests that researchers need to choose their binning software based on whether accuracy or coverage is more important. In addition, the performance of all binning algorithms decreased when samples included multiple related genomes.

In profiling, software either recovered the relative abundance of bacteria in the sample better or detected organisms better, even at very low quantities. However, the latter algorithms identified the wrong organism more often.

Going forward, Pop said the CAMI group will continue to run new challenges with different data sets and new evaluations aimed at more specific aspects of software performance. Pop is excited to see scientists use the benchmarks to address research questions in the laboratory and the clinic.

"The field of metagenomics needs standards to ensure that results are correct, well validated and follow best practices," Pop said. "For instance, if a doctor is going to stage an intervention based on results from metagenomic software, it's essential that those results be correct. Our work provides a roadmap for choosing appropriate software."
This work was led by Alice McHardy of the Department for Computational Biology of Infection Research at the Helmholtz Centre for Infection Research and the Braunschweig Integrated Centre of Systems Biology in Braunschweig, Germany.

This work was supported by an Engineering and Physical Sciences Research Council Grant (Award No. EP/K032208/1), a U.S. Department of Energy contract (Award No. DEAC02-05CH11231) and the Cluster of Excellence on Plant Sciences program funded by the Deutsche Forschungsgemeinschaft. The content of this article does not necessarily reflect the views of these organizations.

The research paper, "Critical Assessment of Metagenome Interpretation - a benchmark of computational metagenomics software," Alice McHardy et al., was published in the journal Nature Methods on October 2, 2017.

Media Relations Contact:

Irene Ying

University of Maryland
College of Computer, Mathematical, and Natural Sciences
2300 Symons Hall
College Park, MD 20742

About the College of Computer, Mathematical, and Natural SciencesThe College of Computer, Mathematical, and Natural Sciences at the University of Maryland educates more than 7,000 future scientific leaders in its undergraduate and graduate programs each year. The college's 10 departments and more than a dozen interdisciplinary research centers foster scientific discovery with annual sponsored research funding exceeding $150 million.

University of Maryland

Related Bacteria Articles from Brightsurf:

Siblings can also differ from one another in bacteria
A research team from the University of Tübingen and the German Center for Infection Research (DZIF) is investigating how pathogens influence the immune response of their host with genetic variation.

How bacteria fertilize soya
Soya and clover have their very own fertiliser factories in their roots, where bacteria manufacture ammonium, which is crucial for plant growth.

Bacteria might help other bacteria to tolerate antibiotics better
A new paper by the Dynamical Systems Biology lab at UPF shows that the response by bacteria to antibiotics may depend on other species of bacteria they live with, in such a way that some bacteria may make others more tolerant to antibiotics.

Two-faced bacteria
The gut microbiome, which is a collection of numerous beneficial bacteria species, is key to our overall well-being and good health.

Microcensus in bacteria
Bacillus subtilis can determine proportions of different groups within a mixed population.

Right beneath the skin we all have the same bacteria
In the dermis skin layer, the same bacteria are found across age and gender.

Bacteria must be 'stressed out' to divide
Bacterial cell division is controlled by both enzymatic activity and mechanical forces, which work together to control its timing and location, a new study from EPFL finds.

How bees live with bacteria
More than 90 percent of all bee species are not organized in colonies, but fight their way through life alone.

The bacteria building your baby
Australian researchers have laid to rest a longstanding controversy: is the womb sterile?

Hopping bacteria
Scientists have long known that key models of bacterial movement in real-world conditions are flawed.

Read More: Bacteria News and Bacteria Current Events is a participant in the Amazon Services LLC Associates Program, an affiliate advertising program designed to provide a means for sites to earn advertising fees by advertising and linking to