Nav: Home

New UW model helps zero in on harmful genetic mutations

October 22, 2015

Between any two people, there are likely to be at least 10 million differences in the genetic sequence that makes up their DNA.

Most of these differences don't alter the way cells behave or cause health problems. But some genetic variations greatly increase the likelihood that a person will develop cancer, diabetes, colorblindness or a host of other diseases.

Despite rapid advances in our ability to map an individual's genome -- the precise coding that makes up his or her genes -- we know much less about which mutations or anomalies actually cause disease.

Now, a new model and publicly available Web tool developed by University of Washington researchers can more accurately and quantitatively predict which genetic mutations significantly change how genes splice and may warrant increased attention from disease researchers and drug developers.

The model -- the first to train a machine learning algorithm on vast amounts of genetic data created with synthetic biology techniques -- is outlined in a paper to be published in the Oct. 22 issue of Cell.

"Some people have variations in a particular gene, but what you really want to know is whether those matter or not," said lead author Alexander Rosenberg, a UW electrical engineering doctoral student. "This model can help you narrow down the universe -- hugely -- of the mutations that might be most likely to cause disease."

In particular, the model predicts how these genetic sequence variations affect alternative splicing -- a critical process that enables a single gene to create many different forms of proteins by including or excluding snippets of RNA.

"This is an avenue that's unexplored to a large extent," said Rosenberg. "It's fairly easy to look at how mutations affect proteins directly, but people have not been able to look at how mutations affect proteins through splicing."

For example, a scientist studying the genetic underpinnings of lung cancer or depression or a particular birth defect could type the most commonly shared DNA sequence in a particular gene into the Web tool, as well as multiple variations. The model will tell the scientist which mutations cause outsized differences in how the gene splices -- which could be a sign of trouble -- and which have little or no effect.

The researcher would still need to investigate whether a particular genetic sequence causes harmful changes, but the online tool can help rule out the many variations that aren't likely to be of interest to health researchers. To validate the model's predictive powers, the UW team tested it on a handful of well-understood mutations such as those in the BRCA2 gene that have been linked to breast and ovarian cancer.

Compared to previously published models, the UW approach is roughly three times more accurate at predicting the extent to which a mutation will cause genetic material to be included or excluded in the protein-making process -- which can change how those proteins function and cause biological processes to go awry.

That's because the UW team used a new approach that combines synthetic biology and machine learning techniques to create the model.

Machine learning algorithms -- which enable computers to infer rules and "learn" from vast amounts of data -- become more accurate the more data they're exposed to. But the human genome only has roughly 25,000 genes that create proteins.

Using common molecular biology techniques, the UW team created a library of over 2 million synthetic "mini-genes" by including random DNA sequences. Then they determined how each random sequence element affected where genes spliced and what types of RNA were produced -- which ultimately determines which proteins get made.

That larger library of synthetic data essentially teaches the model to become smarter, said lead author Georg Seelig, a UW assistant professor of electrical engineering and of computer science & engineering.

"Our algorithm works super well because it was trained on these synthetic datasets. And the reason it works so well is because that synthetic dataset is orders of magnitude larger than the training set you get from the actual human genome," said Seelig.

"It is remarkable that a model trained entirely on synthetic data can outperform models trained directly on the human genome on the task of predicting the impact of mutations in people," he said.

Next research steps include expanding the approach beyond alternative splicing to other processes that determine how genes are expressed.

In the meantime, by making the Web tool free and publicly available, the team hopes other scientists will use their alternative splicing model -- and ultimately make progress in narrowing down which natural genetic variations are most meaningful when it comes to health and disease.

"Other research groups and companies can use our model to rank the areas of interest to them," Seelig said. "We hope other people will take this further to more clinical applications."
-end-
Co-authors include former UW doctoral student Rupali P. Patwardhan and associate professor Jay Shendure in the UW Department of Genome Sciences.

For more information, contact Seelig at gseelig@uw.edu.

University of Washington

Related Genome Articles:

Genome evolution goes digital
Dr. Alan Herbert from InsideOutBio describes ground-breaking research in a paper published online by Royal Society Open Science.
Breakthrough in genome visualization
Kadir Dede and Dr. Enno Ohlebusch at Ulm University in Germany have devised a method for constructing pan-genome subgraphs at different granularities without having to wait hours and days on end for the software to process the entire genome.
Sturgeon genome sequenced
Sturgeons lived on earth already 300 million years ago and yet their external appearance seems to have undergone very little change.
A sea monster's genome
The giant squid is an elusive giant, but its secrets are about to be revealed.
Deciphering the walnut genome
New research could provide a major boost to the state's growing $1.6 billion walnut industry by making it easier to breed walnut trees better equipped to combat the soil-borne pathogens that now plague many of California's 4,800 growers.
Illuminating the genome
Development of a new molecular visualisation method, RNA-guided endonuclease -- in situ labelling (RGEN-ISL) for the CRISPR/Cas9-mediated labelling of genomic sequences in nuclei and chromosomes.
A genome under influence
References form the basis of our comprehension of the world: they enable us to measure the height of our children or the efficiency of a drug.
How a virus destabilizes the genome
New insights into how Kaposi's sarcoma-associated herpesvirus (KSHV) induces genome instability and promotes cell proliferation could lead to the development of novel antiviral therapies for KSHV-associated cancers, according to a study published Sept.
Better genome editing
Reich Group researchers develop a more efficient and precise method of in-cell genome editing.
Unlocking the genome
A team led by Prof. Stein Aerts (VIB-KU Leuven) uncovers how access to relevant DNA regions is orchestrated in epithelial cells.
More Genome News and Genome Current Events

Trending Science News

Current Coronavirus (COVID-19) News

Top Science Podcasts

We have hand picked the top science podcasts of 2020.
Now Playing: TED Radio Hour

Listen Again: The Power Of Spaces
How do spaces shape the human experience? In what ways do our rooms, homes, and buildings give us meaning and purpose? This hour, TED speakers explore the power of the spaces we make and inhabit. Guests include architect Michael Murphy, musician David Byrne, artist Es Devlin, and architect Siamak Hariri.
Now Playing: Science for the People

#576 Science Communication in Creative Places
When you think of science communication, you might think of TED talks or museum talks or video talks, or... people giving lectures. It's a lot of people talking. But there's more to sci comm than that. This week host Bethany Brookshire talks to three people who have looked at science communication in places you might not expect it. We'll speak with Mauna Dasari, a graduate student at Notre Dame, about making mammals into a March Madness match. We'll talk with Sarah Garner, director of the Pathologists Assistant Program at Tulane University School of Medicine, who takes pathology instruction out of...
Now Playing: Radiolab

What If?
There's plenty of speculation about what Donald Trump might do in the wake of the election. Would he dispute the results if he loses? Would he simply refuse to leave office, or even try to use the military to maintain control? Last summer, Rosa Brooks got together a team of experts and political operatives from both sides of the aisle to ask a slightly different question. Rather than arguing about whether he'd do those things, they dug into what exactly would happen if he did. Part war game part choose your own adventure, Rosa's Transition Integrity Project doesn't give us any predictions, and it isn't a referendum on Trump. Instead, it's a deeply illuminating stress test on our laws, our institutions, and on the commitment to democracy written into the constitution. This episode was reported by Bethel Habte, with help from Tracie Hunte, and produced by Bethel Habte. Jeremy Bloom provided original music. Support Radiolab by becoming a member today at Radiolab.org/donate.     You can read The Transition Integrity Project's report here.