Software Helps Decipher Genomes Of Higher Organisms

December 01, 1998

A software program that has been successfully annotating the genes of common bacteria since 1992 is now capable of finding genes in higher organisms. It is particularly useful for finding human genes in anonymous human DNA sequences.

Understanding the genomes of key microorganisms may increase understanding of human genetics because lower organisms have some genes that correspond to human genes. Also scientists can design new drugs based on knowledge of disease-causing bacteria.

The original software program, called GeneMark, uses probabilistic mathematical models to predict the locations of genes on a strand of DNA. GeneMark was developed by Dr. Mark Borodovsky, a professor of biology at the Georgia Institute of Technology. It has become the world's most-used software program for deciphering bacterial DNA and has proven itself 98 percent accurate.

Borodovsky's latest development uses GeneMark.hmm, a refined version of the original program, as its base to make more sophisticated predictions for the genomes of eukaryotic, or higher organisms.

"Deciphering bacterial DNA is simpler than deciphering human DNA since its genes run continuously, without gaps," Borodovsky explained. "The genes of human DNA may be divided into pieces, called exons, with non-coding genetic material between the exons. These spacers in the genes, called introns, were hard to detect by a computer algorithm. Also, eukaryotic DNA is much longer, with an average gene size of 10,000 nucleotides."

Therefore, the predictions of where eukaryotic genes lie on a strand of DNA must include predictions of the boundaries between the exons, which contain the genetic information, and introns, which are the non-coding regions.

To create a computer program to achieve this, Borodovsky employed a probabilistic mathematical model called Hidden Markov Models or HMM. A recent grant from the National Institutes of Health (NIH) is funding incorporation of HMM into GeneMark, making the program responsive to the boundaries between exons and introns.

Borodovsky developed GeneMark.hmm with Dr. Alexander Lukashin, a researcher who works in the lab. A test of the program demonstrated its "state-of- the-art accuracy," said Borodovsky, meaning, when tested against current means of finding eukaryotic genes, GeneMark.hmm performed at least as well as the best current methods.

"GeneMark.hmm is more than a static software program or product," Borodovsky noted. "It is rather an approach for DNA sequence analysis that is under continuous development."

It is already being used to annotate parts of the genomes of five eukaryotic organisms, including humans, nematodes, fruit flies and a plant in the mustard family.

Borodovsky will present his latest results at the International Workshop on Genomic Sequence Analysis on Dec. 1-4 at the Issak Newton Institute for Mathematical Sciences at the University of Cambridge in England.

GeneMark.hmm will fill a need, as evidenced by early demand from scientists, Borodovsky said. Even before information about GeneMark.hmm has been published in a scientific journal, almost 30 researchers expressed interest to one of Borodovsky's graduate students, John Besemer, who gave a poster presentation on GeneMark.hmm at a recent conference on the eukaryotic organism Chlamydomonas reinhardtii.

Meanwhile, Borodovsky has recorded his research in predicting gene coding regions in a chapter of new book "Organization of the Prokaryotic Genome," soon to be published by the American Society of Microbiology. The chapter is called "Statistical Predictions in Genuine Coding Regions."

Borodovsky, a Russian emigre, conceived the idea for GeneMark while still living in Russia in the 1980s. He envisioned a software program based on Markov models to manage the vast amounts of genetic information scientists were churning out.

The Russian mathematician, Andrey Markov, introduced his models early in the 20th century. Borodovsky believed Markov models could portray genes by the frequency of certain combinations of bases in known genes, contrary to non-genes. Therefore these probabilistic models could be applied to DNA sequences to predict where genes would lie on DNA.

When scientists sequence DNA, they are left with strings of nucleotides that need to be separated into genes and non-coding regions and then translated into proteins to make sense.

Since 1992, researchers from around the world have sent their sequenced DNA fragments via e-mail to Georgia Tech's GeneMark e-mail server, which predicts locations of genes. After mapping gene locations, the computer program compares the newly predicted protein sequence to known ones in a database. This determines protein function. The protein analysis is done in collaboration with the National Center for Biotechnology Information at the NIH.

GeneMark has proven itself a powerful tool for finding bacterial genes, in particular. Researchers at the Institute for Genomic Research have used GeneMark to sequence the complete genomes of numerous common bacteria.

GeneMark Genesis, the refined version of GeneMark, which Borodovsky developed with graduate student William Hayes, was used to find genes in genomes of the bacteria Methanoccocus jannaschii and Helicobacter pylori. There were no experimentally studied segments of M. jannaschii available to train the Markov models, upon which gene prediction is based in GeneMark. So the new program "learned Markov models from anonymous sequences based on the grammar of the genetic code," Borodovsky said.

Borodovsky's work is at the forefront of a new interdisciplinary field called bioinformatics, which uses mathematical methods and computers to answer many important biological questions. Bioinformatics can also help discover genes and design new drugs. Borodovsky is spearheading development of Georgia Tech's new master's degree program in bioinformatics, the first such program in the United States.
Georgia Institute of Technology
430 Tenth Street, N.W., Suite N-112
Atlanta, Georgia 30318 USA

John Toon (404-894-6986);
E-mail:; FAX: (404-894-1826) or
Jane Sanders (404-894-2214) (770-975-1014);

Dr. Mark Borodovsky (404-894-8432); E-mail:

WRITER: Jane M. Sanders

Georgia Institute of Technology

Related DNA Articles from Brightsurf:

A new twist on DNA origami
A team* of scientists from ASU and Shanghai Jiao Tong University (SJTU) led by Hao Yan, ASU's Milton Glick Professor in the School of Molecular Sciences, and director of the ASU Biodesign Institute's Center for Molecular Design and Biomimetics, has just announced the creation of a new type of meta-DNA structures that will open up the fields of optoelectronics (including information storage and encryption) as well as synthetic biology.

Solving a DNA mystery
''A watched pot never boils,'' as the saying goes, but that was not the case for UC Santa Barbara researchers watching a ''pot'' of liquids formed from DNA.

Junk DNA might be really, really useful for biocomputing
When you don't understand how things work, it's not unusual to think of them as just plain old junk.

Designing DNA from scratch: Engineering the functions of micrometer-sized DNA droplets
Scientists at Tokyo Institute of Technology (Tokyo Tech) have constructed ''DNA droplets'' comprising designed DNA nanostructures.

Does DNA in the water tell us how many fish are there?
Researchers have developed a new non-invasive method to count individual fish by measuring the concentration of environmental DNA in the water, which could be applied for quantitative monitoring of aquatic ecosystems.

Zigzag DNA
How the cell organizes DNA into tightly packed chromosomes. Nature publication by Delft University of Technology and EMBL Heidelberg.

Scientists now know what DNA's chaperone looks like
Researchers have discovered the structure of the FACT protein -- a mysterious protein central to the functioning of DNA.

DNA is like everything else: it's not what you have, but how you use it
A new paradigm for reading out genetic information in DNA is described by Dr.

A new spin on DNA
For decades, researchers have chased ways to study biological machines.

From face to DNA: New method aims to improve match between DNA sample and face database
Predicting what someone's face looks like based on a DNA sample remains a hard nut to crack for science.

Read More: DNA News and DNA Current Events is a participant in the Amazon Services LLC Associates Program, an affiliate advertising program designed to provide a means for sites to earn advertising fees by advertising and linking to