CMU algorithm rapidly finds anomalies in gene expression data

November 27, 2019

PITTSBURGH--Computational biologists at Carnegie Mellon University have devised an algorithm to rapidly sort through mountains of gene expression data to find unexpected phenomena that might merit further study. What's more, the algorithm then re-examines its own output, looking for mistakes it has made and then correcting them.

This work by Carl Kingsford, a professor in CMU's Computational Biology Department, and Cong Ma, a Ph.D. student in computational biology, is the first attempt at automating the search for these anomalies in gene expression inferred by RNA sequencing, or RNA-seq, the leading method for inferring the activity level of genes.

As they report today in the journal Cell Systems, the researchers already have detected 88 anomalies -- unexpectedly high or low levels of expression of regions within genes -- in two widely used RNA-seq libraries that are both common and not previously known.

"We don't yet know why we're seeing those 88 weird patterns," Kingsford said, noting that they could be a subject of further investigation.

Though an organism's genetic makeup is static, the activity level, or expression, of genes varies greatly over time. Gene expression analysis has thus become a major tool for biological research, as well as for diagnosing and monitoring cancers.

Anomalies can be important clues for researchers, but until now finding them has been a painstaking, manual process, sometimes called "sequence gazing." Finding one anomaly might require examining 200,000 transcript sequences -- sequences of RNA that encode information from the gene's DNA, Kingsford said. Most researchers therefore zero in on regions of genes that they think are important, largely ignoring the vast majority of potential anomalies.

The algorithm developed by Ma and Kingsford automates the search for anomalies, enabling researchers to consider all of the transcript sequences, not just those regions where they expect to see anomalies. This technology could uncover many new phenomena, such as the 88 previously unknown common anomalies found in the multi-tissue RNA-seq libraries.

But Ma noted that identifying anomalies is often not clear cut. Some RNA-seq "reads," for instance, are common to multiple genes and transcripts and sometimes get mapped to the wrong one. If that occurs, a genetic region might appear more or less active than expected. So the algorithm re-examines any anomalies it detects and sees if they disappear when the RNA-seq reads are redistributed between the genes.

"By correcting anomalies when possible, we reduce the number of falsely predicted instances of differential expression," Ma said.
The Gordon and Betty Moore Foundation, the National Science Foundation, the National Institutes of Health, the Shurl and Kay Curci Foundation, and the Pennsylvania Department of Health supported this research.

Carnegie Mellon University

Related Genes Articles from Brightsurf:

Are male genes from Mars, female genes from Venus?
In a new paper in the PERSPECTIVES section of the journal Science, Melissa Wilson reviews current research into patterns of sex differences in gene expression across the genome, and highlights sampling biases in the human populations included in such studies.

New alcohol genes uncovered
Do you have what is known as problematic alcohol use?

How status sticks to genes
Life at the bottom of the social ladder may have long-term health effects that even upward mobility can't undo, according to new research in monkeys.

Symphony of genes
One of the most exciting discoveries in genome research was that the last common ancestor of all multicellular animals already possessed an extremely complex genome.

New genes out of nothing
One key question in evolutionary biology is how novel genes arise and develop.

Good genes
A team of scientists from NAU, Arizona State University, the University of Groningen in the Netherlands, the Center for Coastal Studies in Massachusetts and nine other institutions worldwide to study potential cancer suppression mechanisms in cetaceans, the mammalian group that includes whales, dolphins and porpoises.

How lifestyle affects our genes
In the past decade, knowledge of how lifestyle affects our genes, a research field called epigenetics, has grown exponentially.

Genes that regulate how much we dream
Sleep is known to allow animals to re-energize themselves and consolidate memories.

The genes are not to blame
Individualized dietary recommendations based on genetic information are currently a popular trend.

Timing is everything, to our genes
Salk scientists discover critical gene activity follows a biological clock, affecting diseases of the brain and body.

Read More: Genes News and Genes Current Events is a participant in the Amazon Services LLC Associates Program, an affiliate advertising program designed to provide a means for sites to earn advertising fees by advertising and linking to