Scientists complete DNA sequencing and analysis of multiple fruit fly genomes

November 07, 2007

In one of the first large-scale comparisons of multiple animal genomes, scientists at the Broad Institute of MIT and Harvard, the Computer Science and Artificial Intelligence Laboratory (CSAIL) at MIT, and many collaborating institutions, have analyzed the genomes of twelve species of the fruit fly Drosophila to reveal insights on the evolution of genes and genomes and to discern the functional elements encoded in animal DNA. The work appears in the November 7 issue of Nature and in more than 40 accompanying papers in Genome Research and other journals. The method of comparing the genomes of multiple related species, fly or otherwise, not only reveals new insights into species evolution and identifies thousands of novel genes and other functional elements, but also provides a powerful tool for unraveling genome function that may help researchers unlock the secrets of our own genome.

In these papers, the international consortium reported the genomes of ten newly sequenced Drosophila species, some very closely related and others less so, and their comparison to two previously sequenced flies including Drosophila melanogaster, one of the most powerful model organisms for the study of animal biology and evolution. The availability of the many Drosophila genomes has enabled a great deal of new insights about genome function and aided the study of how genomes have changed across evolutionary time.

"Having the sequences of many closely related species allows us to study the evolutionary forces that have shaped the fruit fly's family tree, and to discover the working parts of the fly genome in a systematic way," said Manolis Kellis, associate member of the Broad Institute, assistant professor in MIT's CSAIL, and one of the consortium's project leaders.

On one hand, the researchers studied the differences across species to help elucidate how evolution has shaped fly biology over millions of years. Their analysis revealed that while many attributes of Drosophila genomes are in fact conserved across multiple species, each species has novel features not seen in any other. In fact, only 77 percent of the approximately 13,700 protein-coding genes in D. melanogaster are shared with all of the other 11 species. For example, the genes involved in interactions with the environment and in reproduction showed signs of adaptive evolution, meaning that they likely provided some survival advantage to the organism.

On the other hand, the researchers studied the similarities of the different species to help define the functional parts of the fly genome. The parts of a genome that are unchanged (conserved) are those that have been kept by evolution, and are thus likely to play crucial roles. Thus, genome comparison can reveal which regions of the genome are functional, based on the degree to which evolution has conserved them.

"Focusing on the conserved part of the genome is a great way to discover what has been maintained by evolution," said Kellis. "Moreover, by looking more closely at the subtle patterns of mutation within conserved regions, we can predict the functional roles they play."

Indeed, at the level of DNA, several combinations of letters, or nucleotides, may encode the same function, in the way that a storyteller can use different combinations of words to tell the same tale. For example, four different nucleotide combinations - GTT, GTC, GTA, and GTG - all encode the same protein building block, or amino acid. Thus, a change in the third letter would leave the amino acid unchanged, one example of how DNA changes can be tolerated while still preserving the function of the corresponding protein.

Through these kinds of random mutations, evolution explores the space of possible nucleotide combinations that preserve function. This exploration produces unique patterns of genomic change, described by the researchers as "evolutionary signatures" that are specific to the function of that region of DNA. Protein-coding genes, for example, show frequent substitutions at every third nucleotide, due to the fact that one amino acid can be encoded by several nucleotide triplets. In contrast, some genes that don't encode proteins -- so-called RNA genes -- show changes that preserve the overall structure of RNA while tolerating changes in the genes' DNA sequence.

Like codebreakers turning their knowledge of biology into computational algorithms, Kellis and his colleagues identified evolutionary signatures associated with a variety of roles in the genome: protein-coding genes, non-coding RNAs, microRNAs, and regulatory motifs. In each case, the researchers identified distinct evolutionary signatures associated with each function, based on the tolerated changes that still preserve that function.

The researchers then used these evolutionary signatures to systematically identify the functional elements encoded in the fly genome, leading to hundreds of novel functional elements and many new insights on animal biology.

The work allowed the discovery of 1,193 new sequences that encode proteins, the flagging of 414 regions that were mistakenly labeled as protein-coding genes, and corrections to hundreds of previously annotated protein-coding genes. This allowed the researchers to revise the catalog of protein-coding genes for Drosophila melanogaster, with updates affecting 10% of all genes. The revision was confirmed through manual curation by scientists at the FlyBase consortium and through large-scale experimental validation led by the Berkeley Drosophila Genome Project.

In addition, the researchers identified hundreds of new RNA genes and structures, new microRNA genes, and new DNA sequences involved in the control of gene expression during embryo development and environmental changes. The twelve genomes also allowed the prediction of very small regulatory targets in the genome, which can help piece together the first regulatory network for an animal genome without having to perform intense and expensive experiments.

The work also led to many surprises. For example, the researchers found many protein-coding genes that defy the traditional rules of how the DNA code gets translated into protein. For example, 150 genes apparently bypass signals that would normally cause DNA to stop being translated, and other genes encode multiple proteins in a single RNA transcript. Other findings include surprising evidence that a single microRNA gene locus can produce up to four functional microRNAs, each with distinct functions.

The team's analysis is the first time that such a diverse range of evolutionary signatures has been applied to identify the functional elements of a genome in a comprehensive way. "By comparing many closely related genomes, we were able to discover things we never thought were possible using one genome sequence alone," said Kellis. One intriguing possibility is that evolutionary signatures may even identify novel, yet unknown classes of functions. For example, although the fruit fly has been intensely studied for over a century, microRNAs were only discovered in the last decade, and are now known to play a central role in development. Many other classes of yet unknown functional elements may be hidden in the fly genome, and recognition of their common evolutionary properties may help lead to their discovery.

The study of the 12 flies has immediate implications for the discovery of functional elements in the human genome. "We are now using similar methods to analyze 32 mammalian genomes, in order to help understand the human genome," Kellis explained. "We should be able to apply the methodology of evolutionary signatures to any group of closely related species."
The Broad Institute of MIT and Harvard was one of several sequencing centers to participate in the work, in addition to Agencourt Bioscience Corporation, the Washington University Genome Sequencing Center, and the J. Craig Venter Institute. The Broad Institute Sequencing Platform, led by Jennifer Baldwin and Robert Nicol and consisting of over 150 researchers, and the Broad Institute Whole Genome Assembly Team led by David Jaffe were major contributors to these efforts and are co-authors of the work.

Papers cited:

Drosophila 12 Genomes Consortium. (2007) Evolution of genes and genomes in the Drosophila phylogeny. Nature DOI:10.1038/nature06341

Stark et al. (2007) Discovery of functional elements in 12 Drosophila genomes using evolutionary signatures. Nature DOI:10.1038/nature06340

Lin et al. (2007) Revisiting the protein-coding gene catalog of Drosophila melanogaster using twelve fly genomes. Genome Research DOI:10.1101/gr6679507

Stark et al. (2007) Systematic discovery and characterization of fly microRNAs using 12 Drosophila genomes. Genome Research DOI:10.1101/gr6593807

Stark et al. (2007) Reliable prediction of regulator targets using 12 Drosophila genomes. Genome Research DOI:10.1101/gr7090407

Rasmussen, Kellis. (2007) Accurate gene-tree reconstruction by learning gene- and species-specific substitution rates across multiple complete genomes. Genome Research DOI:10.1101/gr7105007

Ruby et al. (2007) Evolution, biogenesis, expression, and target predictions of a substantially expanded set of Drosophila microRNAs. Genome Research DOI:10.1101/gr6597907

About the Broad Institute of MIT and Harvard

The Broad Institute of MIT and Harvard was founded in 2003 to bring the power of genomics to biomedicine. It pursues this mission by empowering creative scientists to construct new and robust tools for genomic medicine, to make them accessible to the global scientific community, and to apply them to the understanding and treatment of disease.

The Institute is a research collaboration that involves faculty, professional staff and students from throughout the MIT and Harvard academic and medical communities. It is jointly governed by the two universities.

Organized around Scientific Programs and Scientific Platforms, the unique structure of the Broad Institute enables scientists to collaborate on transformative projects across many scientific and medical disciplines.

For further information about the Broad Institute, go to

Broad Institute of MIT and Harvard

Related DNA Articles from Brightsurf:

A new twist on DNA origami
A team* of scientists from ASU and Shanghai Jiao Tong University (SJTU) led by Hao Yan, ASU's Milton Glick Professor in the School of Molecular Sciences, and director of the ASU Biodesign Institute's Center for Molecular Design and Biomimetics, has just announced the creation of a new type of meta-DNA structures that will open up the fields of optoelectronics (including information storage and encryption) as well as synthetic biology.

Solving a DNA mystery
''A watched pot never boils,'' as the saying goes, but that was not the case for UC Santa Barbara researchers watching a ''pot'' of liquids formed from DNA.

Junk DNA might be really, really useful for biocomputing
When you don't understand how things work, it's not unusual to think of them as just plain old junk.

Designing DNA from scratch: Engineering the functions of micrometer-sized DNA droplets
Scientists at Tokyo Institute of Technology (Tokyo Tech) have constructed ''DNA droplets'' comprising designed DNA nanostructures.

Does DNA in the water tell us how many fish are there?
Researchers have developed a new non-invasive method to count individual fish by measuring the concentration of environmental DNA in the water, which could be applied for quantitative monitoring of aquatic ecosystems.

Zigzag DNA
How the cell organizes DNA into tightly packed chromosomes. Nature publication by Delft University of Technology and EMBL Heidelberg.

Scientists now know what DNA's chaperone looks like
Researchers have discovered the structure of the FACT protein -- a mysterious protein central to the functioning of DNA.

DNA is like everything else: it's not what you have, but how you use it
A new paradigm for reading out genetic information in DNA is described by Dr.

A new spin on DNA
For decades, researchers have chased ways to study biological machines.

From face to DNA: New method aims to improve match between DNA sample and face database
Predicting what someone's face looks like based on a DNA sample remains a hard nut to crack for science.

Read More: DNA News and DNA Current Events is a participant in the Amazon Services LLC Associates Program, an affiliate advertising program designed to provide a means for sites to earn advertising fees by advertising and linking to