Nav: Home

Machine-learning model provides detailed insight on proteins

March 12, 2019

A novel machine-learning 'toolbox' that can read and analyse the sequences of proteins has been described today in the open-access journal eLife.

The study demonstrates that, when trained to read sequence data, artificial neural networks called Restricted Boltzmann Machines (RBM) can provide a wealth of information on protein structure, function and evolutionary features. It is believed to be the first method that can extract this level of detail from sequence data alone.

Proteins are formed of sequences of molecules called amino acids, which determine a given protein's structural and functional properties. But understanding which parts of the sequences are responsible for which properties is challenging. "Answering this question could have significant implications for pharmaceutical development," explains co-author Jérôme Tubiana, former PhD student in the Physics Laboratory at l'École Normale Supérieure (ENS), Paris, France. "For example, it could help with the design of new proteins that have desired functions, or with predicting the future sequence evolution of proteins in living organisms, such as pathogens, and identifying appropriate drug targets."

To explore this question, Tubiana and his collaborators applied RBM to 20 protein 'families' - a group of proteins that share a common evolutionary origin. The researchers presented detailed results for four protein families, including two short protein domains called Kunitz and WW, one long chaperone protein called Hsp70, and synthetic lattice proteins for benchmarking.

They discovered that, after learning, the connections between the artificial neurons in the RBM are interpretable and relate to the protein's structure, function (such as activity) or phylogeny - the evolutionary relationships between protein sequences. Additionally, the team found that they could use RBM to design new protein sequences by composing and turning up or down the different artificial neural units at will.

"Our RBM model shows how machine-learning techniques can solve complex data recognition and draw conclusions from data in an interpretable way," says co-author Simona Cocco, CNRS Director of Research at the ENS Physics Laboratory. "This runs counter to the more complex, black-box models that are traditionally used in data science, as statistical analyses provided by these tools are largely uninterpretable. The interpretability of our method is a major benefit to scientists - it bears the promise of allowing them to generate proteins with desired functions in a controlled way."

"It will now be interesting to apply our model to proteins in pathogens," adds senior author Rémi Monasson, also CNRS Director of Research at the ENS Physics Laboratory, and Deputy Director of the Henri Poincaré Institute (CNRS/Sorbonne University), France. "Pathogens, particularly viruses, can often escape drugs through mutations that make treatments ineffective. Our method could be used to predict the mutational escape paths that are accessible to the functional protein from its current sequence, and help identify which combination of protein sites should be targeted by drugs to block all paths."

The paper 'Learning protein constitutive motifs from sequence data' can be freely accessed online at Contents, including text, figures and data, are free to reuse under a CC BY 4.0 license.

Authors Simona Cocco and Rémi Monasson are affiliated with the ENS Physics Laboratory (CNRS/ENS Paris/Sorbonne Université/Université Paris Diderot).

Media contact

Emily Packer, Senior Press Officer
01223 855373

About eLife

eLife aims to help scientists accelerate discovery by operating a platform for research communication that encourages and recognises the most responsible behaviours in science. We publish important research in all areas of the life and biomedical sciences, including Computational and Systems Biology and Physics of Living Systems, which is selected and evaluated by working scientists and made freely available online without delay. eLife also invests in innovation through open-source tool development to accelerate research communication and discovery. Our work is guided by the communities we serve. eLife is supported by the Howard Hughes Medical Institute, the Max Planck Society, the Wellcome Trust and the Knut and Alice Wallenberg Foundation. Learn more at

To read the latest Computational and Systems Biology research published in eLife, visit

And for the latest Physics of Living Systems research, see


Related Proteins Articles:

Discovering, counting, cataloguing proteins
Scientists describe a well-defined mitochondrial proteome in baker's yeast.
Interrogating proteins
Scientists from the University of Bristol have designed a new protein structure, and are using it to understand how protein structures are stabilized.
Ancient proteins studied in detail
How did protein interactions arise and how have they developed?
What can we learn from dinosaur proteins?
Researchers recently confirmed it is possible to extract proteins from 80-million-year-old dinosaur bones.
Relocation of proteins with a new nanobody tool
Researchers at the Biozentrum of the University of Basel have developed a new method by which proteins can be transported to a new location in a cell.
Proteins that can take the heat
Ancient proteins may offer clues on how to engineer proteins that can withstand the high temperatures required in industrial applications, according to new research published in the Proceedings of the National Academy of Sciences.
Designer proteins fold DNA
Florian Praetorius and Professor Hendrik Dietz of the Technical University of Munich have developed a new method that can be used to construct custom hybrid structures using DNA and proteins.
The proteins that domesticated our genomes
EPFL scientists have carried out a genomic and evolutionary study of a large and enigmatic family of human proteins, to demonstrate that it is responsible for harnessing the millions of transposable elements in the human genome.
Rare proteins collapse earlier
Some organisms are able to survive in hot springs, while others can only live at mild temperatures because their proteins aren't able to withstand such extreme heat.
How proteins reshape cell membranes
Small 'bubbles' frequently form on membranes of cells and are taken up into their interior.

Related Proteins Reading:

Best Science Podcasts 2019

We have hand picked the best science podcasts for 2019. Sit back and enjoy new science podcasts updated daily from your favorite science news services and scientists.
Now Playing: TED Radio Hour

Do animals grieve? Do they have language or consciousness? For a long time, scientists resisted the urge to look for human qualities in animals. This hour, TED speakers explore how that is changing. Guests include biological anthropologist Barbara King, dolphin researcher Denise Herzing, primatologist Frans de Waal, and ecologist Carl Safina.
Now Playing: Science for the People

#SB2 2019 Science Birthday Minisode: Mary Golda Ross
Our second annual Science Birthday is here, and this year we celebrate the wonderful Mary Golda Ross, born 9 August 1908. She died in 2008 at age 99, but left a lasting mark on the science of rocketry and space exploration as an early woman in engineering, and one of the first Native Americans in engineering. Join Rachelle and Bethany for this very special birthday minisode celebrating Mary and her achievements. Thanks to our Patreons who make this show possible! Read more about Mary G. Ross: Interview with Mary Ross on Lash Publications International, by Laurel Sheppard Meet Mary Golda...