NIST-led research de-mystifies origins of 'junk' DNA

March 25, 2004

A debate over the origins of what is sometimes called "junk" DNA has been settled by research involving scientists at the Center for Advanced Research in Biotechnology (CARB) and a collaborator, who developed rigorous proof that these mysterious sections were added to DNA "late" in the evolution of life on earth--after the formation of modern-sized genes, which contain instructions for making proteins.

A biologist with the Commerce Department's National Institute of Standards and Technology (NIST) led the research team, which reported its findings in the March 10 online edition of Molecular Biology and Evolution. The results are based on a systematic, statistically rigorous analysis of publicly available genetic data carried out with bioinformatics software developed at CARB.

In humans, there is so much apparent "junk" DNA (sections of the genome with no known function) that it takes up more space than the functional parts. Much of this junk consists of "introns," which appear as interruptions plopped down in the middle of genes. Discovered in the 1970s, introns mystify scientists but are readily accounted for by cells: when the cellular machinery transcribes a gene in preparation for making a protein, introns are simply spliced out of the transcript.

Research from the CARB group appears to resolve a debate over the "early versus late" timing of the appearance of introns. Since introns were discovered in 1978, scientists have debated whether genes were born split (the "introns-early" view), or whether they became split after eukaryotic cells (the ones that gave rise to animals and their relatives) diverged from bacteria roughly 2 billion years ago (the "introns-late" view). Bacterial genomes lack introns. Although the study did not attempt to propose a function for introns, or determine whether they are beneficial or harmful, the results appear to rule out the "introns-early" view.

The CARB analysis shows that the probability of a modern intron's presence in an ancestral gene common to the genes studied is roughly 1 percent, indicating that the vast majority of today's introns appeared subsequent to the origin of the genes. This conclusion is supported by the findings regarding placement patterns for introns within genes. It long has been observed that, in the sequences of nitrogen-containing compounds that make up our DNA genomes, introns prefer some sites more than others. The CARB study indicates that these preferences are side effects of late-stage intron gain, rather than side effects of intron-mediated gene formation.

The CARB results are based on an analysis of carefully processed data for 10 families of protein-coding genes in animals, plants, fungi and their relatives (see sidebar for details of the method used). A variety of statistical modeling, theoretical, and automated analytical approaches were used; while most were conventional, their combined application to the study of introns was novel. The CARB study also is unique in using an evolutionary model as the basis for inferring the presence of ancestral introns. The research was made possible in part by the increasing availability, over the past decade, of massive amounts of genetic sequence data.

The lead researcher is Arlin B. Stoltzfus of NIST; collaborators include Wei-Gang Qiu, formerly of CARB and the University of Mayland and now at Hunter College in New York City, and Nick Schisler, currently at Furman University, Greenville, S.C.

CARB is a cooperative venture of NIST and the University of Maryland Biotechnology Institute.
Background Information:

CARB's Approach to Understanding the Origins of 'Junk' DNA

Scientists long have compared the sequences of chemical compounds in different proteins, genes and entire genomes to derive clues about structure and function. The most sophisticated comparative methods are evolutionary and rely on matching similar sequences from different organisms, inferring family trees to determine relationships, and reconstructing changes that must have occurred to create biologically relevant differences.

This type of analysis is usually done with one sequence family at a time. The Center for Advanced Research in Biotechnology (CARB), a cooperative venture of the Commerce Department's National Institute of Standards and Technology (NIST) and the University of Maryland Biotechnology Institute, developed software to automate the analysis of dozens--and perhaps hundreds, eventually--of sequence families at a time. The automated methods also assess the reliability of all the information, so that conclusions are based on the most reliable parts of the analysis.

The CARB method has two parts. The first part consists of a combination of manual and automated processing of gene data from public databases. The data are clustered into families through matching of similar sequences, first in pairs and then in groups. Then family trees are developed indicating how the genes are related to each other. A file is developed for each family that includes data on sequence matches, intron locations, family trees and reliability measures.

These datasets then are loaded into the second part of the system, which is fully automated. It consist of a relational database combined with software that computes probabilities for introns being present in ancestral genes using a method developed at CARB. Each gene is assigned to a kingdom (plants, animals, fungi and others), and a matrix of intron presence/absence data is determined for each family based on the sequence alignments. This matrix, along with the family tree, is used to estimate ancestral states of introns, as well as rates of intron loss and gain. Additional software is used for analysis and visualization of results.

The CARB study analyzed data for 10 families of protein-coding genes in multi-celled organisms, encompassing 1,868 introns at 488 different positions.

National Institute of Standards and Technology (NIST)

Related DNA Articles from Brightsurf:

A new twist on DNA origami
A team* of scientists from ASU and Shanghai Jiao Tong University (SJTU) led by Hao Yan, ASU's Milton Glick Professor in the School of Molecular Sciences, and director of the ASU Biodesign Institute's Center for Molecular Design and Biomimetics, has just announced the creation of a new type of meta-DNA structures that will open up the fields of optoelectronics (including information storage and encryption) as well as synthetic biology.

Solving a DNA mystery
''A watched pot never boils,'' as the saying goes, but that was not the case for UC Santa Barbara researchers watching a ''pot'' of liquids formed from DNA.

Junk DNA might be really, really useful for biocomputing
When you don't understand how things work, it's not unusual to think of them as just plain old junk.

Designing DNA from scratch: Engineering the functions of micrometer-sized DNA droplets
Scientists at Tokyo Institute of Technology (Tokyo Tech) have constructed ''DNA droplets'' comprising designed DNA nanostructures.

Does DNA in the water tell us how many fish are there?
Researchers have developed a new non-invasive method to count individual fish by measuring the concentration of environmental DNA in the water, which could be applied for quantitative monitoring of aquatic ecosystems.

Zigzag DNA
How the cell organizes DNA into tightly packed chromosomes. Nature publication by Delft University of Technology and EMBL Heidelberg.

Scientists now know what DNA's chaperone looks like
Researchers have discovered the structure of the FACT protein -- a mysterious protein central to the functioning of DNA.

DNA is like everything else: it's not what you have, but how you use it
A new paradigm for reading out genetic information in DNA is described by Dr.

A new spin on DNA
For decades, researchers have chased ways to study biological machines.

From face to DNA: New method aims to improve match between DNA sample and face database
Predicting what someone's face looks like based on a DNA sample remains a hard nut to crack for science.

Read More: DNA News and DNA Current Events is a participant in the Amazon Services LLC Associates Program, an affiliate advertising program designed to provide a means for sites to earn advertising fees by advertising and linking to