Machine learning reveals unexpected genetic roots of cancers, autism and other disorders

December 18, 2014

In the decade since the genome was sequenced in 2003, scientists and doctors have struggled to answer an all-consuming question: Which DNA mutations cause disease?

A new computational technique developed at the University of Toronto may now be able to tell us.

A Canadian research team led by professor Brendan Frey has developed the first method for 'ranking' genetic mutations based on how living cells 'read' DNA, revealing how likely any given alteration is to cause disease. They used their method to discover unexpected genetic determinants of autism, hereditary cancers and spinal muscular atrophy, a leading genetic cause of infant mortality.

Their findings appear in today's issue of the leading journal Science.

Think of the human genome as a mysterious text, made up of three billion letters. "Over the past decade, a huge amount of effort has been invested into searching for mutations in the genome that cause disease, without a rational approach to understanding why they cause disease," says Frey, also a senior fellow at the Canadian Institute for Advanced Research. "This is because scientists didn't have the means to understand the text of the genome and how mutations in it can change the meaning of that text." Biologist Eric Lander of the Massachusetts Institute of Technology captured this puzzle in his famous quote: "Genome. Bought the book. Hard to read."

What was Frey's approach? We know that certain sections of the text, called exons, describe the proteins that are the building blocks of all living cells. What wasn't appreciated until recently is that other sections, called introns, contain instructions for how to cut and paste exons together, determining which proteins will be produced. This 'splicing' process is a crucial step in the cell's process of converting DNA into proteins, and its disruption is known to contribute to many diseases.

Most research into the genetic roots of disease has focused on mutations within exons, but increasingly scientists are finding that diseases can't be explained by these mutations. Frey's team took a completely different approach, examining changes to text that provides instructions for splicing, most of which is in introns.

Frey's team used a new technology called 'deep learning' to teach a computer system to scan a piece of DNA, read the genetic instructions that specify how to splice together sections that code for proteins, and determine which proteins will be produced.

Unlike other machine learning methods, deep learning can make sense of incredibly complex relationships, such as those found in living systems in biology and medicine. "The success of our project relied crucially on using the latest deep learning methods to analyze the most advanced experimental biology data," says Frey, whose team included members from University of Toronto's Faculty of Applied Science & Engineering, Faculty of Medicine and the Terrence Donnelly Centre for Cellular and Biomolecular Research, as well as Microsoft Research and the Cold Spring Harbor Laboratory. "My collaborators and our graduate students and postdoctoral fellows are world-leading experts in these areas."

Once they had taught their system how to read the text of the genome, Frey's team used it to search for mutations that cause splicing to go wrong. They found that their method correctly predicted 94 percent of the genetic culprits behind well-studied diseases such as spinal muscular atrophy and colorectal cancer, but more importantly, made accurate predictions for mutations that had never been seen before.

They then launched a huge effort to tackle a condition with complex genetic underpinnings: autism spectrum disorder. "With autism there are only a few dozen genes definitely known to be involved and these account for a small proportion of individuals with this condition," says Frey.

In collaboration with Dr. Stephen Scherer, senior scientist and director of The Centre for Applied Genomics at SickKids and the University of Toronto McLaughlin Centre, Frey's team compared mutations discovered in the whole genome sequences of children with autism, but not in controls. Following the traditional approach of studying protein-coding regions, they found no differences. However, when they used their deep learning system to rank mutations according to how much they change splicing, surprising patterns appeared.

"When we ranked mutations using our method, striking patterns emerged, revealing 39 novel genes having a potential role in autism susceptibility," Frey says.

And autism is just the beginning--this mutation indexing method is ready to be applied to any number of diseases, and even non-disease traits that differ between individuals.

Dr. Juan Valcárcel Juárez, a researcher with the Center for Genomic Regulation in Barcelona, Spain, who was not involved in this research, says: "In a way it is like having a language translator: it allows you to understand another language, even if full command of that language will require that you also study the underlying grammar. The work provides important information for personalized medicine, clearly a key component of future therapies."
-end-
Brendan Frey is a professor in The Edward S. Rogers Sr. Department of Electrical & Computer Engineering, with cross-appointments to the Donnelly Centre for Cellular and Biomolecular Research, Department of Molecular Genetics, Department of Computer Science, and McLaughlin Centre at University of Toronto. He is a senior fellow of the Canadian Institute for Advanced Research, and a member of the Technical Advisory Board of Microsoft Research.

This research was supported by The Canadian Institute for Advanced Research (CIFAR); The Canadian Institutes of Health Research (CIHR); The Natural Sciences and Engineering Research Council of Canada (NSERC); The McLaughlin Centre; Autism Speaks; Genome Canada; Autism Training Program.

University of Toronto Faculty of Applied Science & Engineering

Related Autism Articles from Brightsurf:

Autism-cholesterol link
Study identifies genetic link between cholesterol alterations and autism.

National Autism Indicators Report: the connection between autism and financial hardship
A.J. Drexel Autism Institute released the 2020 National Autism Indicators Report highlighting the financial challenges facing households of children with autism spectrum disorder (ASD), including higher levels of poverty, material hardship and medical expenses.

Autism risk estimated at 3 to 5% for children whose parents have a sibling with autism
Roughly 3 to 5% of children with an aunt or uncle with autism spectrum disorder (ASD) can also be expected to have ASD, compared to about 1.5% of children in the general population, according to a study funded by the National Institutes of Health.

Adulthood with autism
The independence that comes with growing up can be scary for any teenager, but for young adults with autism spectrum disorder and their caregivers, the transition from adolescence to adulthood can seem particularly daunting.

Brain protein mutation from child with autism causes autism-like behavioral change in mice
A de novo gene mutation that encodes a brain protein in a child with autism has been placed into the brains of mice.

Autism and theory of mind
Theory of mind, or the ability to represent other people's minds as distinct from one's own, can be difficult for people with autism.

Potential biomarker for autism
A study of young children with autism spectrum disorder published in JNeurosci reveals altered brain waves compared to typically developing children during a motor control task.

Autism often associated with multiple new mutations
Most autism cases are in families with no previous history of the disorder.

State laws requiring autism coverage by private insurers led to increases in autism care
A new study led by researchers at the Johns Hopkins Bloomberg School of Public Health has found that the enactment of state laws mandating coverage of autism spectrum disorder (ASD) was followed by sizable increases in insurer-covered ASD care and associated spending.

Autism's gender patterns
Having one child with autism is a well-known risk factor for having another one with the same disorder, but whether and how a sibling's gender influences this risk has remained largely unknown.

Read More: Autism News and Autism Current Events
Brightsurf.com is a participant in the Amazon Services LLC Associates Program, an affiliate advertising program designed to provide a means for sites to earn advertising fees by advertising and linking to Amazon.com.