Computer generates comparative gene maps

December 20, 2000

ITHACA, N.Y. -- Comparing the genomes of two related species of a plant or animal often helps to locate important genes that have been identified in one species but not in another, and can provide clues about how both species evolved from a common ancestor. But making these "comparative gene maps" has been a slow, painstaking process, something biologists do by hand over weeks, months or years, using data painstakingly collected in "wet labs" and analyzed with software designed to interpret only one map at a time.

Now, Cornell University researchers have come up with a way to do the comparison step in a few hours on a computer. In early tests, a computer-generated comparison of the genomes of rice and maize (corn) closely matched a similar map made by hand, and even suggested some relationships that had not shown up in the handmade map.

Debra Goldberg, Cornell graduate student in applied mathematics, developed the new method in collaboration with Susan McCouch, Cornell professor of plant breeding, and Jon Kleinberg, Cornell assistant professor of computer science. Goldberg described their work at the Gene Order Dynamics, Comparative Maps and Multigene Families (DCAF) workshop held September in Sainte-Adèle, Quebec, and will present a later version at the Plant and Animal Genome IX conference in San Diego in January. Their paper, "Algorithms for Constructing Comparative Maps," appears in Comparative Genomics (David Sankoff and Joseph H. Nadeau, Eds., Kluwer Academic Publishers, 2000). A software implementation of the new method soon will be available to geneticists.

"The point of this isn't just to compare rice and corn, but to be able to do it with any two species," Goldberg says. "Ideally we'd like to be able to find new evolutionary pathways."

Every so often, as reproductive cells divide, genes and segments of chromosomes get shuffled around. One chromosome meets another and pieces of DNA are moved or swapped. If those particular cells then happen to be involved in reproduction, the new arrangement is passed on to the next generation and may spread through the population. It doesn't happen very often, but over evolutionary time scales many such events show up. Related species descended from a common ancestor have many genes in common, but they occur in different arrangements. A strand of DNA that used to be on chromosome 2 in some common ancestor ends up on chromosome 10, in between two pieces that used to belong to ancestral chromosomes 3 and 5. The relocated genes often continue to do the same jobs, and often several genes move together, retaining their ancestral order along a segment of DNA.

By comparing genomes, scientists can trace the evolutionary paths, and there are immediate practical applications. If it's known that genes A and B are near each other in the rice genome, and the location of gene A in maize also is known, then a comparative map could help locate gene B in maize. In plant breeding, such a discovery could help to breed corn with better disease resistance or improved nutritional value. In medicine, clues from the genome of the mouse are being used to help find genes associated with human diseases.

The idea of comparative mapping is to align genes in the order they are found along the chromosomes of the first or "base" species with those found in the same order on a single chromosome of the second or "target" species. The raw data consists of ordered lists of the genes and gene markers of both species that have been identified in "wet lab" experiments.

At the simplest level, a computer could look at each gene or marker of the base species, find where it is (on which arm of which chromosome) on the target genome, and label it accordingly. But geneticists want to step back to get a larger view, identifying segments of the base genome that contain arrays of genes that also are found together on the target genome. The catch is what McCouch calls "noise" in the data: the target genome can contain long arrays of genes that look like those on the base genome except that there are a few extra genes here and there that come from somewhere else in the genome. How does the computer decide whether or not to ignore the out-of-place genes? When are two similar linear arrays of genes close enough to be called a match?

In early stages of the work, Goldberg applied constraints, called "penalties," both for out-of-place genes and for breaks between segments. The computer was directed to minimize both the number of segments it created and the number of out-of-place genes in each segment. While promising, when applied to a comparison between rice and maize this approach still didn't produce a map close enough to one made by hand, Goldberg says. Among other things, the computer often introduced too few breaks where a small part of one sequence appeared in the middle of another.

So, Goldberg added a procedure that remembers the labels of genes as it goes along, making decisions about what sequences go together on the basis of an overall trend rather than considering just one gene at a time. Based on the sequence it remembered, the computer was allowed to reduce the penalties for breaks between segments. In other words, if a small but meaningful sequence of out-of place genes appeared in the middle of another matching sequence, it would be marked as a separate segment. But if just a few out-of-place genes turned up and didn't have a meaningful relationship, the overall sequence still would be listed as a single segment.

In computer-science terms, the label for each gene is pushed onto a stack in memory, and popped back off when it gets to be too unlikely. This procedure, the researchers say in their paper, draws on computer methods for parsing sentences in natural-language processing, in which a program remembers words until the end of a sentence and only then decides what the sentence means.

Each chromosome in a living organism consists of two adjacent arms, and the algorithm also was modified to give special consideration to related orders of genes that appear on different arms of the same chromosome. In some cases biologists know which chromosome a gene is on, but not on which arm, so special consideration also was given to those "ambiguous" genes.

The researchers tested their computer method by comparing a computer-generated comparative map of rice and maize with a handmade map prepared in 1999 by William A. Wilson (a postdoctoral fellow in the Department of Plant Breeding at Cornell and now in private industry), and several colleagues at Cornell and Iowa State University. The computer mapping done by Goldberg was based on Wilson's original data. The results, the researchers say, were remarkably similar, although in their paper they note some minor differences. They also point out that handmade maps usually are made with reference to additional information that biologists hold in their memories, such as the order of genes along the chromosomes of other related species.

The computer also found a small "footprint" of an ancestral chromosome in maize that did not turn up in the handmade map, McCouch says. This will be investigated further in the lab, she says.

Besides rice and maize, the algorithm has been tested on a comparison between the mouse and human genomes. "It appears to work well in both cases," McCouch says "It is certainly our intention to present this algorithm as a replacement for the construction of hand-crafted comparative maps."
Related World Wide Web sites: The following sites provide additional information on this news release. Some might not be part of the Cornell University community, and Cornell has no control over their content or availability. o Cornell Center for Applied Mathematics:

o Susan McCouch home page:

o Jon Kleinberg home page:

Cornell University

Related Genome Articles from Brightsurf:

Genome evolution goes digital
Dr. Alan Herbert from InsideOutBio describes ground-breaking research in a paper published online by Royal Society Open Science.

Breakthrough in genome visualization
Kadir Dede and Dr. Enno Ohlebusch at Ulm University in Germany have devised a method for constructing pan-genome subgraphs at different granularities without having to wait hours and days on end for the software to process the entire genome.

Sturgeon genome sequenced
Sturgeons lived on earth already 300 million years ago and yet their external appearance seems to have undergone very little change.

A sea monster's genome
The giant squid is an elusive giant, but its secrets are about to be revealed.

Deciphering the walnut genome
New research could provide a major boost to the state's growing $1.6 billion walnut industry by making it easier to breed walnut trees better equipped to combat the soil-borne pathogens that now plague many of California's 4,800 growers.

Illuminating the genome
Development of a new molecular visualisation method, RNA-guided endonuclease -- in situ labelling (RGEN-ISL) for the CRISPR/Cas9-mediated labelling of genomic sequences in nuclei and chromosomes.

A genome under influence
References form the basis of our comprehension of the world: they enable us to measure the height of our children or the efficiency of a drug.

How a virus destabilizes the genome
New insights into how Kaposi's sarcoma-associated herpesvirus (KSHV) induces genome instability and promotes cell proliferation could lead to the development of novel antiviral therapies for KSHV-associated cancers, according to a study published Sept.

Better genome editing
Reich Group researchers develop a more efficient and precise method of in-cell genome editing.

Unlocking the genome
A team led by Prof. Stein Aerts (VIB-KU Leuven) uncovers how access to relevant DNA regions is orchestrated in epithelial cells.

Read More: Genome News and Genome Current Events is a participant in the Amazon Services LLC Associates Program, an affiliate advertising program designed to provide a means for sites to earn advertising fees by advertising and linking to