Nav: Home

Improved RNA data visualization method gets to the bigger picture faster

February 14, 2019

Like going from a pinhole camera to a Polaroid, a significant mathematical update to the formula for a popular bioinformatics data visualization method will allow researchers to develop snapshots of single-cell gene expression not only several times faster but also at much higher-resolution. Published in Nature Methods, this innovation by Yale mathematicians will reduce the rendering time of a million-point single-cell RNA-sequencing (scRNA-seq) data set from over three hours down to just fifteen minutes.

Scientists say the existing decade-old method, t-distributed Stochastic Neighborhood Embedding (t-SNE), is great for representing patterns in RNA sequencing data gathered at the single cell level, scRNA-seq data, in two dimensions. "In this setting, t-SNE 'organizes' the cells by the genes they express and has been used to discover new cell types and cell states," said George Linderman, lead author and a Yale M.D.-Ph.D. student specializing in applied mathematics.

By computational standards, t-SNE is quite slow. Thus, researchers often "downsample" their scRNA-seq dataset -- take a smaller sample from the initial sample -- before applying t-SNE. However, downsampling is a poor compromise, as it makes it unlikely for t-SNE to capture rare cell populations, which are often what researchers most want to identify.

More than 30 years ago, another team of Yale mathematicians developed the fast multipole method (FMM), a revolutionary numerical technique that sped up the calculation of long-ranged forces in the n-body problem. The researchers on this study recognized that the principles behind the FMM could also be applied to nonlinear dimensional reduction problems, such as t-SNE, and accelerated t-SNE until it earned its new name: FIt-SNE, or fast interpolation-based t-SNE.

"Using our approach, researchers can not only analyze single cell RNA-sequencing data faster, but it also can be used to characterize rare cell subpopulations that cannot be detected if the data is subsampled prior to t-SNE," said Yuval Kluger, senior author and Yale professor of pathology. Additionally, the team used a heatmap-style visualization for its FIt-SNE results, which makes it easy for researchers to see the expression patterns of thousands of genes at the level of single cells simultaneously.

The researchers said 2019 couldn't be a better new year for t-SNE to get "FIt." In December 2018, Science Magazine named tracking development of embryos cell by cell -- impossible to accomplish without visualizations based on scRNA-seq data -- the Breakthrough of the Year. FIt-SNE will speed up further work in this field of developmental biology as well as in fields such as neuroscience and cancer research, where single-cell sequencing has become an invaluable tool for mapping the brain and understanding tumors, said the researchers.
Software for FIt-SNE and the heatmap-style visualization is available at and

Other authors include Manas Rachh, Jeremy G. Hoskins, and Stefan Steinerberger.

Authors on this study have received grants from the Air Force Office of Scientific Research, the Alfred P. Sloan Foundation, the National Institutes of Health, and/or the National Science Foundation.

Yale University

Related Data Articles:

Storing data in music
Researchers at ETH Zurich have developed a technique for embedding data in music and transmitting it to a smartphone.
Life data economics: calling for new models to assess the value of human data
After the collapse of the blockchain bubble a number of research organisations are developing platforms to enable individual ownership of life data and establish the data valuation and pricing models.
Geoscience data group urges all scientific disciplines to make data open and accessible
Institutions, science funders, data repositories, publishers, researchers and scientific societies from all scientific disciplines must work together to ensure all scientific data are easy to find, access and use, according to a new commentary in Nature by members of the Enabling FAIR Data Steering Committee.
Democratizing data science
MIT researchers are hoping to advance the democratization of data science with a new tool for nonstatisticians that automatically generates models for analyzing raw data.
Getting the most out of atmospheric data analysis
An international team including researchers from Kanazawa University used a new approach to analyze an atmospheric data set spanning 18 years for the investigation of new-particle formation.
More Data News and Data Current Events

Best Science Podcasts 2019

We have hand picked the best science podcasts for 2019. Sit back and enjoy new science podcasts updated daily from your favorite science news services and scientists.
Now Playing: TED Radio Hour

Erasing The Stigma
Many of us either cope with mental illness or know someone who does. But we still have a hard time talking about it. This hour, TED speakers explore ways to push past — and even erase — the stigma. Guests include musician and comedian Jordan Raskopoulos, neuroscientist and psychiatrist Thomas Insel, psychiatrist Dixon Chibanda, anxiety and depression researcher Olivia Remes, and entrepreneur Sangu Delle.
Now Playing: Science for the People

#537 Science Journalism, Hold the Hype
Everyone's seen a piece of science getting over-exaggerated in the media. Most people would be quick to blame journalists and big media for getting in wrong. In many cases, you'd be right. But there's other sources of hype in science journalism. and one of them can be found in the humble, and little-known press release. We're talking with Chris Chambers about doing science about science journalism, and where the hype creeps in. Related links: The association between exaggeration in health related science news and academic press releases: retrospective observational study Claims of causality in health news: a randomised trial This...