Nav: Home

Software locates sugarcane genes of interest

May 15, 2019

Plants have larger and more complex genomes than all animals, be they mammals, birds, reptiles or amphibians. Fishes are the exception to the rule.

Human DNA consists of some 3.2 billion base pairs spread out over 23 pairs of chromosomes, for a total of 46 chromosomes. The genome of wheat (Triticum aestivum), however, comprises 17 billion base pairs divided into 21 pairs of chromosomes (a total of 42). The genome of sugarcane (Saccharum spp.) contains 10 billion base pairs in 100-130 chromosomes.

The sugarcane grown today is a hybrid (S. hybridum) cross-bred from two species, S. officinarum - the original sugarcane domesticated in India 3,000 years ago - and S. spontaneum.

"The sugarcane genome has become a giant. It's very hard to work with it using current genomic methods. Deciphering it requires a huge amount of computing power. It's difficult even with state-of-the art computers in processing terms, and they're expensive. In sum, this is a challenge for bioinformatics," said Marcelo Falsarella Carazzolle, bioinformatics coordinator in the Genomics and Bioenergy Laboratory (LGE) at the University of Campinas's Biology Institute (IB-UNICAMP) in São Paulo State, Brazil.

"For years, laboratories in various parts of the world have tried and failed to map the sugarcane genome. The first successful endeavor was completed only a few months ago by a consortium of researchers in several countries, including Brazil," Carazzolle said.

The strategy deployed by the consortium involved massive large-scale computing and heavy investment to sequence the whole genome, i.e., all 10 billion base pairs.

In an article published in the journal DNA Research, Carazzolle and colleagues present a different strategy that is much less costly and time consuming. This technique is designed to map specific portions of the genomes of polyploid plants.

Some of the research underpinning this innovation was performed for a PhD thesis by Karina Yanagui de Almeida and for a postdoctoral project by Juliana José. Both are biologists at IB-UNICAMP and were supervised by Professor Gonçalo Amarante Guimarães Pereira. Brazil's National Council for Scientific and Technological Development (CNPq) also provided funding.

"We developed the software necessary to reconstruct these complex genomes and applied it to sugarcane. We weren't trying to assemble the whole genome. Previous studies set out to reconstruct the plant's entire DNA, but our strategy consisted of focusing on small portions corresponding to about 1%-2%, exactly where the genes of interest for plant breeders are located," Carazzolle explained.

This strategy saved at least two orders of magnitude compared with the tens of millions of dollars it would cost to map the whole genome. When the project was completed, the consortium had not yet published their results, so the Brazilian geneticists had to use publicly available data - such as the genomes of sorghum, rice and corn, which are related to sugarcane to a greater or lesser extent - to locate the areas they wanted to decipher in the analogous regions of the sugarcane genome.

Selection by analogy was possible because all grasses have a common ancestor that existed more than 50 million years ago. In other words, after all this time, the DNA of any grass today - sugarcane, wheat, sorghum, rice, corn, etc. - still preserves the original core structure, alongside the billions of mutations that have occurred over the eons.

Gene assembler

The outcome of the research conducted at IB-UNICAMP was a software package called Polyploid Gene Assembler (PGA). "PGA represents a novel strategy for assembly of a genetic space based on complex genomes using low-coverage DNA sequencing," Carazzolle said.

Although PGA requires less computer power than the massive processing of a polyploid's whole genome, a very large system is still required to run the program in a timely manner. In this case, the researchers used the computer cluster belonging to the Center for Computing in Engineering & Science (CCES), one of the Research, Innovation and Dissemination Centers (RIDCs funded by São Paulo Research Foundation - FAPESP. Carazzolle is the principal investigator for bioinformatics at CCES.

"The project required the use of CCES's high-performance computers with plenty of memory," Carazzolle said.

They loaded PGA with known gene loci from public genome databases, deploying assembly strategies to construct high-quality genome sequences for the species investigated, and validated the procedure with wheat (Triticum aestivum), a hexaploid species, using barley (Hordeum vulgare) as a reference. More than 90% of the genes were identified, as well as several new genes.

In addition, they used PGA to assemble the genes from grass species S. spontaneum - grouped in the same genus as traditional sugarcane (S. officinarum), S. spontaneum is used in the parental lineage of the hybrid sugarcane varieties widely grown today (S. hybridum).

"We identified a total of 39,234 genes, 60.4% of which were clustered into known grass gene families. Thirty-seven gene families were expanded when compared with other grasses. Three stood out for the number of gene copies potentially involved in initial development and stress response," Carazzolle said.

"Our findings for the genome of S. spontaneum highlighted for the first time the molecular basis of certain significant characteristics, such as high productivity and resistance to biotic and abiotic stress. These results can be used in future functional and genetic studies. They will also support the development of new sugarcane varieties.

"Using PGA, we provided a high-quality assembly of gene regions in T. aestivum and S. spontaneum, proving that PGA can be more efficient than conventional strategies applied to complex genomes and using low-coverage DNA sequencing. PGA's low memory requirement in comparison with the conventional assembly strategy is also an advantage."

Carazzolle stressed that even with significant advances in sequencing technology, the assembly of complex genomes still represents a bottleneck, owing mainly to polyploidy and high heterozygosity. The development of new bioinformatics efforts, he added, can help overcome these constraints, especially in the case of the whole genomes of closely related organisms, for which reference-guided assembly methods can be used.
About São Paulo Research Foundation (FAPESP)

The São Paulo Research Foundation (FAPESP) is a public institution with the mission of supporting scientific research in all fields of knowledge by awarding scholarships, fellowships and grants to investigators linked with higher education and research institutions in the State of São Paulo, Brazil. FAPESP is aware that the very best research can only be done by working with the best researchers internationally. Therefore, it has established partnerships with funding agencies, higher education, private companies, and research organizations in other countries known for the quality of their research and has been encouraging scientists funded by its grants to further develop their international collaboration. You can learn more about FAPESP at and visit FAPESP news agency at to keep updated with the latest scientific breakthroughs FAPESP helps achieve through its many programs, awards and research centers. You may also subscribe to FAPESP news agency at

Fundação de Amparo à Pesquisa do Estado de São Paulo

Related Dna Articles:

A new spin on DNA
For decades, researchers have chased ways to study biological machines.
From face to DNA: New method aims to improve match between DNA sample and face database
Predicting what someone's face looks like based on a DNA sample remains a hard nut to crack for science.
Self-healing DNA nanostructures
DNA assembled into nanostructures such as tubes and origami-inspired shapes could someday find applications ranging from DNA computers to nanomedicine.
DNA design that anyone can do
Researchers at MIT and Arizona State University have designed a computer program that allows users to translate any free-form drawing into a two-dimensional, nanoscale structure made of DNA.
DNA find
A Queensland University of Technology-led collaboration with University of Adelaide reveals that Australia's pint-sized banded hare-wallaby is the closest living relative of the giant short-faced kangaroos which roamed the continent for millions of years, but died out about 40,000 years ago.
DNA structure impacts rate and accuracy of DNA synthesis
DNA sequences with the potential to form unusual conformations, which are frequently associated with cancer and neurological diseases, can in fact slow down or speed up the DNA synthesis process and cause more or fewer sequencing errors.
Changes in mitochondrial DNA control how nuclear DNA mutations are expressed in cardiomyopathy
Differences in the DNA within the mitochondria, the energy-producing structures within cells, can determine the severity and progression of heart disease caused by a nuclear DNA mutation.
Switching DNA and RNA on and off
DNA and RNA are naturally polarised molecules. Scientists believe that these molecules have an in-built polarity that can be reoriented or reversed fully or in part under an electric field.
New DNA synthesis technique promises rapid, high-fidelity DNA printing
Today, DNA is synthesized as an organic chemist would, using toxic chemicals and error-prone steps that limit accuracy and thus length to about 200 base pairs.
The changing shape of DNA
The shape of DNA can be changed with a range of triggers including copper and oxygen - according to new research from the University of East Anglia.
More Dna News and Dna Current Events

Best Science Podcasts 2019

We have hand picked the best science podcasts for 2019. Sit back and enjoy new science podcasts updated daily from your favorite science news services and scientists.
Now Playing: TED Radio Hour

Rethinking Anger
Anger is universal and complex: it can be quiet, festering, justified, vengeful, and destructive. This hour, TED speakers explore the many sides of anger, why we need it, and who's allowed to feel it. Guests include psychologists Ryan Martin and Russell Kolts, writer Soraya Chemaly, former talk radio host Lisa Fritsch, and business professor Dan Moshavi.
Now Playing: Science for the People

#538 Nobels and Astrophysics
This week we start with this year's physics Nobel Prize awarded to Jim Peebles, Michel Mayor, and Didier Queloz and finish with a discussion of the Nobel Prizes as a way to award and highlight important science. Are they still relevant? When science breakthroughs are built on the backs of hundreds -- and sometimes thousands -- of people's hard work, how do you pick just three to highlight? Join host Rachelle Saunders and astrophysicist, author, and science communicator Ethan Siegel for their chat about astrophysics and Nobel Prizes.