The grammar of cell development branching time

May 29, 2019

One of the greatest achievements of science in recent years is the technology for obtaining information about thousands of individual cells extracted from an organism. This technology includes the so-called "omics" for individual cells (genomics, epigenomics, transcriptomics, proteomics), which give us the data about the genomes of thousands of single cells, the state and activity of various genes in them, as well as the presence of various proteins in these cells.

It is convenient to present the data about each cell as a point in a highly multidimensional space. As a result, using the new technology, scientists around the world obtain thousands of points (cells) in a space of enormous dimension.

The study based on data analysis methods including topological and geometric analysis, "topological grammars", "principal graphs method", "data approximation", etc., is an important element of the new (and huge in terms of investments and number of players) technology for data acquisition about living organisms. This new technology involves "omics" of single cells. Such data open up colossal and not yet fully realized opportunities for the development of biology and personalized medicine.

The idea of branching development time allows one to convert the resulting mountains of data into a more understandable, readable and interpretable form. We can imagine that each cell lies on some development trajectory. These trajectories can branch at the point where the cell in its development makes the choice of one future variant from several possible ones. Geometrically, these development trajectories with bifurcation points represent the branching development time.

A new technology for extracting this branching time from the data was developed by a large international team of researchers, including 15 scientists from 6 countries: the USA, China, France, Italy, the UK and Russia.

Complex trees are constructed using elementary transformation grammars. At each step of the basic algorithm, the elementary transformation that gives the greatest gain in the quality of data approximation is chosen.

The method of topological grammars for processing complex data of a general nature was proposed as early as 2007 by Professor Alexander Gorban (Great Britain, currently supervising the implementation of a megagrant at the Lobachevsky State University of Nizhny Novgorod) and his former student Andrei Zinoviev (France, currently collaborating with the Lobachevsky University in the implementation of the megagrant).

"The concept of branching time (or, as it is often called, pseudo-time) arises in biology in the following way: cells and events that occur to them are placed along a certain graph (or, more formally, a one-dimensional continuum, since a graph is a discrete object). This branching continuum plays the same role in the analysis of developmental and differentiation events as linear time in other areas (a scale for event placement). No mystery or modification of physical time is involved. People have introduced this concept and many of them use it. It is convenient. The topology of this scale is extracted from data analysis. Next, the data is mapped on this scale," explains Alexander Gorban.

This method was developed further as part of a broad international cooperation organized by L. Pinello from Harvard University and was used to create a specialized software product STREAM, which builds the branching time of cell development based on the "omics" data of all single cells.

"Just imagine, fairly recently, we learned with great delight and a sense of wonder about the decoding of the human genome. The proposed new technology makes it possible to determine the status and activity of genes and other important data simultaneously for tens of thousands of cells taken from the body. This information will be determined for each of them individually rather than by giving some sort of average values. Thus, extremely important information will be provided about the development of an individual organism and the origination of various diseases such as cancer. However, one must be able to read, decipher and extract useful information from such data. We provide a tool for working with these data and extracting important information from them," continues Alexander Gorban.

"Both the graph and its embedding in the data space are built simultaneously, and then this embedding is used to map all the data. That is, a highly multidimensional space (with the dimension of hundreds of thousands) is reduced to a branching one-dimensional continuum," concludes Alexander Gorban.
An article describing the method and the first results of its application has been published in a new issue of Nature Communication magazine:

H Chen, L Albergante, JY Hsu, CA Lareau, G Lo Bosco, J Guan, S Zhou, AN Gorban, DE Bauer, MJ Aryee, DM Langenau, A Zinovyev, JD Buenrostro, GC Yuan, and L Pinello, Single-cell trajectories reconstruction data with STREAM, Nature Communications, volume 10, Article number: 1903 (2019),

STREAM software, its ElPiGraph compute core, and other project-related programs are freely available online:

Lobachevsky University

Related Genes Articles from Brightsurf:

Are male genes from Mars, female genes from Venus?
In a new paper in the PERSPECTIVES section of the journal Science, Melissa Wilson reviews current research into patterns of sex differences in gene expression across the genome, and highlights sampling biases in the human populations included in such studies.

New alcohol genes uncovered
Do you have what is known as problematic alcohol use?

How status sticks to genes
Life at the bottom of the social ladder may have long-term health effects that even upward mobility can't undo, according to new research in monkeys.

Symphony of genes
One of the most exciting discoveries in genome research was that the last common ancestor of all multicellular animals already possessed an extremely complex genome.

New genes out of nothing
One key question in evolutionary biology is how novel genes arise and develop.

Good genes
A team of scientists from NAU, Arizona State University, the University of Groningen in the Netherlands, the Center for Coastal Studies in Massachusetts and nine other institutions worldwide to study potential cancer suppression mechanisms in cetaceans, the mammalian group that includes whales, dolphins and porpoises.

How lifestyle affects our genes
In the past decade, knowledge of how lifestyle affects our genes, a research field called epigenetics, has grown exponentially.

Genes that regulate how much we dream
Sleep is known to allow animals to re-energize themselves and consolidate memories.

The genes are not to blame
Individualized dietary recommendations based on genetic information are currently a popular trend.

Timing is everything, to our genes
Salk scientists discover critical gene activity follows a biological clock, affecting diseases of the brain and body.

Read More: Genes News and Genes Current Events is a participant in the Amazon Services LLC Associates Program, an affiliate advertising program designed to provide a means for sites to earn advertising fees by advertising and linking to