New computer program detects overlooked gene segments: Previous estimates of human gene number too low

November 28, 2001

In order to study genes for a wide variety of research, diagnostic, or therapeutic purposes, scientists use computer programs that analyze DNA sequences. These programs indicate where pieces of genes are located within what is frequently a vast and complex genetic landscape. Although conventional programs detect many parts of genes with ease, they fail when it comes to detecting two important elements--the very first pieces of genes, and the nearby "on" switches of genes called promoters.

Researchers in the bioinformatics program at Cold Spring Harbor Laboratory have now developed a computer program that is especially good at finding these first segments and "on" switches of genes. The program is tailored toward detecting these features in the human genome sequence, but it will also be useful for annotating other mammalian genomes.

The program--called "First Exon Finder" or "FirstEF"--was developed by Michael Zhang and his colleagues. A paper describing the program is published in the December issue of Nature Genetics.

"FirstEF is the first program that can readily and accurately detect a class of gene segments that has previously been extraordinarily difficult to find," says Zhang. "It's like looking for buried treasure."

The gene segments Zhang is referring to occur at the very beginning of genes, and are called "non-coding first exons." Because they do not encode protein segments, non-coding first exons are undetectable by conventional computer programs that rely on protein coding patterns found in DNA.

Instead, FirstEF recognizes five other DNA "signatures" that betray the presence and location of first exons in genes. The biological basis of some of these telltale genetic signatures is unknown, says Zhang. "But they are real, and perhaps someday biology will explain why they are there." One such signature is the frequency with which two building blocks of DNA, C and G, occur next to each other.

Despite the fact that they do not encode protein, non-coding first exons are essential components of gene structure and function. Consequently, the ability to detect non-coding first exons is crucial for scientists wishing to study genes for a wide variety of biological and biomedical applications.

"The results Michael Zhang is getting with FirstEF are very exciting," says James Kent, a graduate student at the University of California at Santa Cruz. Kent's own computer program called "GigAssembler" caused a sensation in the world of genome research when he used it to generate the first and only publicly-available assembly of the human genome sequence in June of last year. Kent hopes to add a FirstEF "track" to the Human Genome Browser he has created (available at

When Zhang used FirstEF to analyze the DNA sequences of human chromosomes 21 and 22, he found that the program correctly pinpointed the location of 90 percent of known first exons on those chromosomes. According to Zhang, FirstEF was nearly twice as sensitive as a program available from DoubleTwist, Inc. and Genomatix Software GmbH called "PromoterInspector." Zhang was joined in this study by postdoctoral researchers Ramana Davuluri (now on the faculty at Ohio State University) and Ivo Grosse.

Later, Zhang and his colleagues used FirstEF to analyze the entire human genome. They identified some 68,000 first exons. This result does not necessarily mean that there are 68,000 or so human genes, because a single gene can use alternative first exons. Moreover, the total number of genes in an organism's genome depends on other, subtle definitions of what constitutes a gene. Nevertheless, Zhang believes there are 50 to 60,000 human genes and that previous estimates of 30 to 40,000 human genes are too low.

One bonus of the way FirstEF operates is that it identifies not only first exons of genes, but also the "on" switches of genes called "promoters."

"A significant bottleneck in current DNA research is finding the promoters of genes. Because gene promoters and first exons are related, FirstEF kills two birds with one stone," says Zhang.

Cold Spring Harbor Laboratory

Related DNA Articles from Brightsurf:

A new twist on DNA origami
A team* of scientists from ASU and Shanghai Jiao Tong University (SJTU) led by Hao Yan, ASU's Milton Glick Professor in the School of Molecular Sciences, and director of the ASU Biodesign Institute's Center for Molecular Design and Biomimetics, has just announced the creation of a new type of meta-DNA structures that will open up the fields of optoelectronics (including information storage and encryption) as well as synthetic biology.

Solving a DNA mystery
''A watched pot never boils,'' as the saying goes, but that was not the case for UC Santa Barbara researchers watching a ''pot'' of liquids formed from DNA.

Junk DNA might be really, really useful for biocomputing
When you don't understand how things work, it's not unusual to think of them as just plain old junk.

Designing DNA from scratch: Engineering the functions of micrometer-sized DNA droplets
Scientists at Tokyo Institute of Technology (Tokyo Tech) have constructed ''DNA droplets'' comprising designed DNA nanostructures.

Does DNA in the water tell us how many fish are there?
Researchers have developed a new non-invasive method to count individual fish by measuring the concentration of environmental DNA in the water, which could be applied for quantitative monitoring of aquatic ecosystems.

Zigzag DNA
How the cell organizes DNA into tightly packed chromosomes. Nature publication by Delft University of Technology and EMBL Heidelberg.

Scientists now know what DNA's chaperone looks like
Researchers have discovered the structure of the FACT protein -- a mysterious protein central to the functioning of DNA.

DNA is like everything else: it's not what you have, but how you use it
A new paradigm for reading out genetic information in DNA is described by Dr.

A new spin on DNA
For decades, researchers have chased ways to study biological machines.

From face to DNA: New method aims to improve match between DNA sample and face database
Predicting what someone's face looks like based on a DNA sample remains a hard nut to crack for science.

Read More: DNA News and DNA Current Events is a participant in the Amazon Services LLC Associates Program, an affiliate advertising program designed to provide a means for sites to earn advertising fees by advertising and linking to