Nav: Home

Software locates sugarcane genes of interest

May 15, 2019

Plants have larger and more complex genomes than all animals, be they mammals, birds, reptiles or amphibians. Fishes are the exception to the rule.

Human DNA consists of some 3.2 billion base pairs spread out over 23 pairs of chromosomes, for a total of 46 chromosomes. The genome of wheat (Triticum aestivum), however, comprises 17 billion base pairs divided into 21 pairs of chromosomes (a total of 42). The genome of sugarcane (Saccharum spp.) contains 10 billion base pairs in 100-130 chromosomes.

The sugarcane grown today is a hybrid (S. hybridum) cross-bred from two species, S. officinarum - the original sugarcane domesticated in India 3,000 years ago - and S. spontaneum.

"The sugarcane genome has become a giant. It's very hard to work with it using current genomic methods. Deciphering it requires a huge amount of computing power. It's difficult even with state-of-the art computers in processing terms, and they're expensive. In sum, this is a challenge for bioinformatics," said Marcelo Falsarella Carazzolle, bioinformatics coordinator in the Genomics and Bioenergy Laboratory (LGE) at the University of Campinas's Biology Institute (IB-UNICAMP) in São Paulo State, Brazil.

"For years, laboratories in various parts of the world have tried and failed to map the sugarcane genome. The first successful endeavor was completed only a few months ago by a consortium of researchers in several countries, including Brazil," Carazzolle said.

The strategy deployed by the consortium involved massive large-scale computing and heavy investment to sequence the whole genome, i.e., all 10 billion base pairs.

In an article published in the journal DNA Research, Carazzolle and colleagues present a different strategy that is much less costly and time consuming. This technique is designed to map specific portions of the genomes of polyploid plants.

Some of the research underpinning this innovation was performed for a PhD thesis by Karina Yanagui de Almeida and for a postdoctoral project by Juliana José. Both are biologists at IB-UNICAMP and were supervised by Professor Gonçalo Amarante Guimarães Pereira. Brazil's National Council for Scientific and Technological Development (CNPq) also provided funding.

"We developed the software necessary to reconstruct these complex genomes and applied it to sugarcane. We weren't trying to assemble the whole genome. Previous studies set out to reconstruct the plant's entire DNA, but our strategy consisted of focusing on small portions corresponding to about 1%-2%, exactly where the genes of interest for plant breeders are located," Carazzolle explained.

This strategy saved at least two orders of magnitude compared with the tens of millions of dollars it would cost to map the whole genome. When the project was completed, the consortium had not yet published their results, so the Brazilian geneticists had to use publicly available data - such as the genomes of sorghum, rice and corn, which are related to sugarcane to a greater or lesser extent - to locate the areas they wanted to decipher in the analogous regions of the sugarcane genome.

Selection by analogy was possible because all grasses have a common ancestor that existed more than 50 million years ago. In other words, after all this time, the DNA of any grass today - sugarcane, wheat, sorghum, rice, corn, etc. - still preserves the original core structure, alongside the billions of mutations that have occurred over the eons.

Gene assembler

The outcome of the research conducted at IB-UNICAMP was a software package called Polyploid Gene Assembler (PGA). "PGA represents a novel strategy for assembly of a genetic space based on complex genomes using low-coverage DNA sequencing," Carazzolle said.

Although PGA requires less computer power than the massive processing of a polyploid's whole genome, a very large system is still required to run the program in a timely manner. In this case, the researchers used the computer cluster belonging to the Center for Computing in Engineering & Science (CCES), one of the Research, Innovation and Dissemination Centers (RIDCs funded by São Paulo Research Foundation - FAPESP. Carazzolle is the principal investigator for bioinformatics at CCES.

"The project required the use of CCES's high-performance computers with plenty of memory," Carazzolle said.

They loaded PGA with known gene loci from public genome databases, deploying assembly strategies to construct high-quality genome sequences for the species investigated, and validated the procedure with wheat (Triticum aestivum), a hexaploid species, using barley (Hordeum vulgare) as a reference. More than 90% of the genes were identified, as well as several new genes.

In addition, they used PGA to assemble the genes from grass species S. spontaneum - grouped in the same genus as traditional sugarcane (S. officinarum), S. spontaneum is used in the parental lineage of the hybrid sugarcane varieties widely grown today (S. hybridum).

"We identified a total of 39,234 genes, 60.4% of which were clustered into known grass gene families. Thirty-seven gene families were expanded when compared with other grasses. Three stood out for the number of gene copies potentially involved in initial development and stress response," Carazzolle said.

"Our findings for the genome of S. spontaneum highlighted for the first time the molecular basis of certain significant characteristics, such as high productivity and resistance to biotic and abiotic stress. These results can be used in future functional and genetic studies. They will also support the development of new sugarcane varieties.

"Using PGA, we provided a high-quality assembly of gene regions in T. aestivum and S. spontaneum, proving that PGA can be more efficient than conventional strategies applied to complex genomes and using low-coverage DNA sequencing. PGA's low memory requirement in comparison with the conventional assembly strategy is also an advantage."

Carazzolle stressed that even with significant advances in sequencing technology, the assembly of complex genomes still represents a bottleneck, owing mainly to polyploidy and high heterozygosity. The development of new bioinformatics efforts, he added, can help overcome these constraints, especially in the case of the whole genomes of closely related organisms, for which reference-guided assembly methods can be used.
About São Paulo Research Foundation (FAPESP)

The São Paulo Research Foundation (FAPESP) is a public institution with the mission of supporting scientific research in all fields of knowledge by awarding scholarships, fellowships and grants to investigators linked with higher education and research institutions in the State of São Paulo, Brazil. FAPESP is aware that the very best research can only be done by working with the best researchers internationally. Therefore, it has established partnerships with funding agencies, higher education, private companies, and research organizations in other countries known for the quality of their research and has been encouraging scientists funded by its grants to further develop their international collaboration. You can learn more about FAPESP at and visit FAPESP news agency at to keep updated with the latest scientific breakthroughs FAPESP helps achieve through its many programs, awards and research centers. You may also subscribe to FAPESP news agency at

Fundação de Amparo à Pesquisa do Estado de São Paulo

Related Dna Articles:

Penn State DNA ladders: Inexpensive molecular rulers for DNA research
New license-free tools will allow researchers to estimate the size of DNA fragments for a fraction of the cost of currently available methods.
It is easier for a DNA knot...
How can long DNA filaments, which have convoluted and highly knotted structure, manage to pass through the tiny pores of biological systems?
How do metals interact with DNA?
Since a couple of decades, metal-containing drugs have been successfully used to fight against certain types of cancer.
Electrons use DNA like a wire for signaling DNA replication
A Caltech-led study has shown that the electrical wire-like behavior of DNA is involved in the molecule's replication.
Switched-on DNA
DNA, the stuff of life, may very well also pack quite the jolt for engineers trying to advance the development of tiny, low-cost electronic devices.
Researchers are first to see DNA 'blink'
Northwestern University biomedical engineers have developed imaging technology that is the first to see DNA 'blink,' or fluoresce.
Finding our way around DNA
A Salk team developed a tool that maps functional areas of the genome to better understand disease.
A 'strand' of DNA as never before
In a carefully designed polymer, researchers at the Institute of Physical Chemistry of the Polish Academy of Sciences have imprinted a sequence of a single strand of DNA.
Doubling down on DNA
The African clawed frog X. laevis genome contains two full sets of chromosomes from two extinct ancestors.
'Poring over' DNA
Church's team at Harvard's Wyss Institute for Biologically Inspired Engineering and the Harvard Medical School developed a new electronic DNA sequencing platform based on biologically engineered nanopores that could help overcome present limitations.

Related Dna Reading:

Best Science Podcasts 2019

We have hand picked the best science podcasts for 2019. Sit back and enjoy new science podcasts updated daily from your favorite science news services and scientists.
Now Playing: TED Radio Hour

Jumpstarting Creativity
Our greatest breakthroughs and triumphs have one thing in common: creativity. But how do you ignite it? And how do you rekindle it? This hour, TED speakers explore ideas on jumpstarting creativity. Guests include economist Tim Harford, producer Helen Marriage, artificial intelligence researcher Steve Engels, and behavioral scientist Marily Oppezzo.
Now Playing: Science for the People

#524 The Human Network
What does a network of humans look like and how does it work? How does information spread? How do decisions and opinions spread? What gets distorted as it moves through the network and why? This week we dig into the ins and outs of human networks with Matthew Jackson, Professor of Economics at Stanford University and author of the book "The Human Network: How Your Social Position Determines Your Power, Beliefs, and Behaviours".