Scientists report first complete genome sequence of a plant

December 12, 2000

An international effort to sequence the entire genome of the plant species Arabidopsis thaliana is now complete. This first-ever complete genome sequence from a plant has many implications for biology, medicine, agriculture, and the environment because it will enable detailed studies of the entire genetic structure of plants to be carried out. Such studies will yield a great deal of new information about the gene products that are involved in many aspects of plant growth and development, and how these gene products carry out their functions.

Despite its status as a diminutive relative of the mustard plant, Arabidopsis thaliana is a powerful tool in plant molecular biology and genetics. The short generation time and relatively compact genome of Arabidopsis (a flowering plant) make it an ideal model system for understanding numerous features of plant biology, including ones that are of great pharmaceutical or agricultural value.

The sequencing studies, reported in the December 14, 2000, issue of the journal Nature, provide new information about chromosome structure, evolution, and gene organization in plants. Among the many new genes discovered were several involved in disease resistance and intracellular signaling, as well as homologs of a number of human disease genes. Perhaps the most surprising result of these studies, and related studies published last year (see below), is the extent to which vast chromosomal regions have been duplicated in the Arabidopsis genome. In fact, the new study indicates that the evolution of Arabidopsis involved a whole-genome duplication, followed by gene loss and additional, extensive local gene duplications.

The Arabidopsis genome was found to contain 25,498 genes encoding proteins from 11,000 families, similar to the functional diversity of the fruit fly Drosophila and the soil nematode worm C. elegans--two other multicellular organisms whose genomes have been completely sequenced. However, Arabidopsis has many plant-specific families of proteins (e.g. transcription factors) and lacks several kinds of proteins common to vertebrates, Drosophila, and C. elegans (e.g. the signalling pathway proteins Wingless/Wnt, Hedgehog, Notch/lin12, JAK/STAT, TGF-beta/SMADs).

"We are several years ahead of schedule," says Cold Spring Harbor Laboratory scientist W. Richard McCombie, referring to the progress that the international Arabidopsis Genome Initiative has made toward its goal of completing the sequencing project. "Throughout this endeavor, all of the groups involved have worked hard to share information, and that has made all the difference."

The Arabidopsis genome contains approximately 125 million base pairs of DNA (125 Mb) distributed among five chromosomes. One year ago, a U.S. consortium lead by McCombie reported the DNA sequence of chromosome 4 in collaboration with The European Union Arabidopsis Genome Sequencing Consortium lead by Michael Bevan of the John Innes Centre (Norwich, UK). Cold Spring Harbor Laboratory scientist Robert Martienssen was instrumental in organizing the international sequencing effort at its outset in 1996, and played a major role in interpreting the chromosome 4 results (see section entitled "Plant Biology at Cold Spring Harbor Laboratory" below).

"The completion of the Arabidopsis genome sequence has profound implications for human health as well as plant biology and agriculture," says Martienssen. In addition to McCombie and Martienssen, Ellson Chen of Perkin Elmer Biosystems based in Foster City, California, and Richard Wilson of the Washington University Medical School Genome Sequencing Center in St. Louis, Missouri, were principal investigators in the U.S. consortium that reported the chromosome 4 results last year.

A team of scientists at The Institute for Genomic Research (TIGR) in Rockville, Maryland, lead by J. Craig Venter determined the DNA sequence of Arabidopsis chromosome 2, which was reported in Nature last year together with the chromosome 4 results. The complete sequences of chromosome 2 (19 Mb) and chromosome 4 (17 Mb) represented roughly one-third of the plant's genome.

Today, the Arabidopsis Genome Initiative announces that it has completed the DNA sequence of the remaining chromosomes, which represent two-thirds of the entire genome. The principal teams in the new report are:

Chromosome 1
TIGR; Stanford Genome Technology Center; Plant Sciences Institute, University of Pennsylvania; Plant Gene Expression Center, UC Berkeley
Chromosome 3
European Union Arabidopsis Genome Sequencing Consortium; TIGR; Kazusa DNA Research Institute
Chromosome 5
Kazusa DNA Research Institute; The Cold Spring Harbor and Washington University in St. Louis Sequencing Consortium; European Union Arabidopsis Genome Sequencing Consortium

The major supporters of the U.S. sequencing effort were the National Science Foundation, the U.S. Department of Agriculture, and the U.S. Department of Energy.

The potential function of approximately 70 percent of the 25,498 genes of Arabidopsis can be predicted based on their similarity to other genes of known function in Arabidopsis or other organisms. However, the functions of the remaining 30% of Arabidopsis genes are unknown, and only 9% of Arabidopsis genes have been characterized experimentally. Future studies of the Arabidopsis genome and the proteins it encodes (particularly those with no known function) will be greatly facilitated by combining the new DNA sequence information with a multitude of existing genetic and molecular biological strategies and resources that are available to Arabidopsis researchers (for example, see the "gene trap" transposable element strategy described in the section entitled "Plant Biology at Cold Spring Harbor Laboratory" below).

McCombie says that the pace of the Arabidopsis sequencing project was accelerated by a first-of-its-kind effort to use high-throughput "whole-genome random BAC fingerprint analysis" to map a large eukaryotic genome in its entirety and provide an ordered set of DNA clones for sequencing (BAC, bacterial artificial chromosome). This analysis of the Arabidopsis genome was completed by Wilson, Marco Marra, and their colleagues at the Washington University Medical School Genome Sequencing Center with assistance from McCombie, Martienssen, and Larry Parnell of Cold Spring Harbor Laboratory. The random BAC fingerprinting technique has rapidly become the method of choice for mapping and sequencing the comparatively large genomes of other eukaryotic organisms, including humans. The human genome contains an estimated 3.2 billion base pairs of DNA, roughly 25 times more than Arabidopsis.
For streaming video about this story, visit:
B-roll is also available

For more information about the Arabidopsis Genome Initiative, visit:


Cold Spring Harbor Laboratory
Cold Spring Harbor Laboratory is a private, non-profit basic research and educational institution. Under the leadership of Dr. Bruce Stillman, a member of the National Academy of Sciences and a Fellow of the Royal Society (London), some 260 scientists conduct research in cancer, neurobiology, and plant genetics. Its other areas of research expertise include molecular and cellular biology, genetics, structural biology, and bioinformatics.

Plant Biology at Cold Spring Harbor Laboratory
Plant biology has a long and rich history at Cold Spring Harbor Laboratory. In 1908, George Schull found that by cross-pollinating corn plants, he could consistently produce higher yielding progeny. His theory of "hybrid vigor" has become widely known and has found many applications in agriculture and genetics, and is based on research Schull performed at CSHL. Barbara McClintock studied maize (corn) genetics here for fifty years, beginning in 1942. At Cold Spring Harbor Laboratory, McClintock discovered "controlling elements" which she found can switch other genes on and off as a consequence of their movement within the genome. In 1983, McClintock was awarded the Nobel Prize for her discoveries concerning controlling elements, later known as transposable elements, transposons, or "jumping genes."

In 1992, CSHL plant biologist Rob Martienssen and his colleagues published a "gene-trap" system for Arabidopsis based on McClintock's transposable elements. Using this system, Martienssen created several thousand mutant strains of Arabidopsis, an invaluable resource for the study of gene expression during plant development. As a result of their collaborative effort to sequence and characterize these transposon insertions in the Arabidopsis genome, Martienssen and another CSHL scientist, W. Richard McCombie, were founding members of the Arabidopsis Genome Initiative--a global consortium established in 1996 to sequence the entire genome of Arabidopsis.

In 1999, Martienssen and McCombie reported the first complete DNA sequence of a plant chromosome -- chromosome 4 -- from Arabidopsis, in collaboration with The European Union Arabidopsis Genome Sequencing Consortium, led by Michael Bevan of the John Innes Centre. Simultaneously, scientists at TIGR completed the sequence of Arabidopsis chromosome 2.

The Plant Genomics Center at CSHL's new Genome Research Center will support continuing studies of plant genome structure and function by Martienssen, McCombie and their CSHL colleagues. Information from these studies will be made widely available to individuals, scientists and industries with agricultural and environmental interests via databases maintained at the Center. This project is supported by the National Science Foundation.

For more information, visit the Laboratory's website,, or call the Department of Public Affairs at 516-367-8455. Some of the material in this news release is identical to information included in a previous news release dated December 15, 1999 ("Scientists Report First Complete Sequence of Plant Chromosomes) issued when the complete sequence of Arabidopsis chromosomes 2 and 4 were published in the December 16, 1999 issue of Nature. A copy of that news release is available on request.

Cold Spring Harbor Laboratory

Related DNA Articles from Brightsurf:

A new twist on DNA origami
A team* of scientists from ASU and Shanghai Jiao Tong University (SJTU) led by Hao Yan, ASU's Milton Glick Professor in the School of Molecular Sciences, and director of the ASU Biodesign Institute's Center for Molecular Design and Biomimetics, has just announced the creation of a new type of meta-DNA structures that will open up the fields of optoelectronics (including information storage and encryption) as well as synthetic biology.

Solving a DNA mystery
''A watched pot never boils,'' as the saying goes, but that was not the case for UC Santa Barbara researchers watching a ''pot'' of liquids formed from DNA.

Junk DNA might be really, really useful for biocomputing
When you don't understand how things work, it's not unusual to think of them as just plain old junk.

Designing DNA from scratch: Engineering the functions of micrometer-sized DNA droplets
Scientists at Tokyo Institute of Technology (Tokyo Tech) have constructed ''DNA droplets'' comprising designed DNA nanostructures.

Does DNA in the water tell us how many fish are there?
Researchers have developed a new non-invasive method to count individual fish by measuring the concentration of environmental DNA in the water, which could be applied for quantitative monitoring of aquatic ecosystems.

Zigzag DNA
How the cell organizes DNA into tightly packed chromosomes. Nature publication by Delft University of Technology and EMBL Heidelberg.

Scientists now know what DNA's chaperone looks like
Researchers have discovered the structure of the FACT protein -- a mysterious protein central to the functioning of DNA.

DNA is like everything else: it's not what you have, but how you use it
A new paradigm for reading out genetic information in DNA is described by Dr.

A new spin on DNA
For decades, researchers have chased ways to study biological machines.

From face to DNA: New method aims to improve match between DNA sample and face database
Predicting what someone's face looks like based on a DNA sample remains a hard nut to crack for science.

Read More: DNA News and DNA Current Events is a participant in the Amazon Services LLC Associates Program, an affiliate advertising program designed to provide a means for sites to earn advertising fees by advertising and linking to