Computers close in on protein structure prediction

September 15, 2005

Computers can predict the detailed structure of small proteins nearly as well as experimental methods, at least some of the time, according to new studies by Howard Hughes Medical Institute researchers.

The findings, which were reported in the September 16, 2005, issue of the journal Science, provide a glimmer of hope that scientists eventually may be able to determine the structure of proteins from their genomic sequences, a problem that has seemed insurmountable.

"For more than 40 years, people have known the amino acid sequence of a protein specifies its three-dimensional structure, but no one has been able to translate the sequence into an accurate structure," said senior author David Baker, an HHMI researcher at the University of Washington. "The reason this research is exciting is that we're showing progress in predicting the structure from the sequence. It's not that the problem is solved, but that there is hope."

Proteins are biological machines, and scientists need to determine their structures to understand how the proteins work. Now, scientists determine structures exclusively by measuring the atomic characteristics of proteins in the lab. In contrast, "in this case, we never touched a test tube," Baker said. "We gave it to a computer and said, 'go.'"

In the study, a sophisticated computer program folded 17 short strings of amino acids into 100,000 possible variations. When the researchers compared the best predictions to the actual structures solved earlier by other scientists using experimental techniques, they had the same success rate as the best hitters in major league baseball.

"We achieved almost atomic resolution in structure prediction for about one-third of our benchmark set of small proteins," said first author Philip Bradley, a postdoctoral fellow in Baker's lab. "It is a real step forward to achieve structures that are in some way comparable to what you can get by experiments."

The encouraging results come from a refinement of a sophisticated computer modeling program called Rosetta, first developed several years ago in Baker's lab. The program works on the premise that proteins collapse into their lowest energy state, like a ball that rolls down a hill until it comes to rest on level ground. The energies of hundreds of thousands of possible shapes generated by the computer are computed, and the lowest energy shape is selected as the prediction.

The prediction process happens in two steps, Bradley said. The first stage uses an approximate model which allows rapid calculation of the energy and so can be carried out rapidly, while the second uses a very detailed model for which the energy calculations take much longer but are much more accurate. A large scale search through possible structures is carried out in the first stage, and promising locations are then explored in detail in the second stage.

The first stage takes advantage of the fact that all amino acids have identical sections, which form the protein backbone. The computer adds a fuzzy picture of the protruding side chains that give each amino acid its unique identity. The sequence of side chains ultimately gives each protein its characteristic shape by the environment and neighbors they prefer.

Then the computer randomly twists, loops, and bends each amino acid sequence into 100,000 different shapes based on the preferred location of the amino acids. Some amino acids tend to dive toward the watery world of the protein surface while others take cover inside the protein. The computer also accounts for the social habits of the 20 amino acids; some want to be close to each other and others like their distance.

In stage two, Rosetta replaces the fuzzy picture of the side chains with detailed, physically realistic models with all the atoms represented. From the positions of the atoms in the sidechains and the protein backbone, the computer then uses a detailed physical chemistry based force field which favors close packing of atoms and hydrogen bonding to more accurately compute the energy of the structure.

"What seems to be critical is the packing of the molecule," Baker said. "The protein fits together perfectly with no holes in the middle, and no atoms on top of each other. It's about as densely packed as it could be. It's like a three-dimensional jigsaw puzzle."

The researchers upped their odds of finding the right match by repeating the two-step process with 50 homologs of the proteins from other genomes, such as a mouse or fly. The protocol was first tested on a blind annual prediction test considered to be the highest standard for removing bias from protein structure prediction models.

"We can't compute the energies perfectly, but the biggest problem is the search through possible shapes," Baker said. "Where we were not getting the right answer on the computer, it was almost always the case that the actual structure had the lowest energy, so we would have succeeded if we had explored this part of the space."

In a related paper published in the August issue of the journal Proteins, Baker and his colleagues reported that similar approaches can be used to predict the structures of protein complexes. "For the first time, computational methods are able, for a subset of cases, to produce really accurate models," he said.

Baker compares the computer simulations of the proteins to the problem of trying to find the lowest point on the surface of the Earth for the first time. A simple way to find the lowest place on the planet is to send out as many explorers as possible. The more explorers there are the more likely one of them is to stumble onto the shoreline of the Dead Sea - the Earth's lowest point on land not covered by water. Each of the thousands of computer simulations is like one explorer.

Although the 33 percent success rate reported in the Science paper might be good enough to secure hall-of-fame status for a baseball player, Baker is quick to point out that it is not yet reliable enough for biology. Better models will depend on both smarter exploration strategies and more computer power. "If methods stayed where we are, we wouldn't solve the problem," Baker said. "On the other hand, we would do better with 10 times more computer time."

It takes less than one minute for a protein to fold into its correct shape in cells, but one oft-repeated estimate predicts it would take longer than the age of the universe for a computer to sample all the possible confirmations of a folded protein. Baker's lab already receives help from supercomputing centers in San Diego and Illinois.

More help will soon be on its way from many of the 5,000 freshman entering University of Washington this fall. Using software developed to assist the Search for Extraterrestrial Intelligence (SETI) project, the students can put their computers to work at night while they are sleeping to search the atomic landscape for the lowest energy structure of proteins.

To improve protein structure prediction further, Baker's group has also started a distributed computing project that they are hoping will be aided by members of the public. The project, called Rosetta@home, is a scientific research project that uses internet-connected computers to predict and design protein structures, and protein-protein and protein-ligand interactions. The goal is to develop methods that accurately predict and design protein structures and complexes, an endeavor that may ultimately help researchers develop cures for human diseases such as cancer, HIV/AIDS, and malaria. More information is available online at http://boinc.bakerlab.org/rosetta.
-end-


Howard Hughes Medical Institute

Related Amino Acids Articles from Brightsurf:

Igniting the synthetic transport of amino acids in living cells
Researchers from ICIQ's Ballester group and IRBBarcelona's Palacín group have published a paper in Chem showing how a synthetic carrier calix[4]pyrrole cavitand can transport amino acids across liposome and cell membranes bringing future therapies a step closer.

Microwaves are useful to combine amino acids with hetero-steroids
Aza-steroids are important class of compounds because of their numerous biological activities.

New study finds two amino acids are the Marie Kondo of molecular liquid phase separation
a team of biologists at the Advanced Science Research Center at The Graduate Center, CUNY (CUNY ASRC) have identified unique roles for the amino acids arginine and lysine in contributing to molecule liquid phase properties and their regulation.

Prediction of protein disorder from amino acid sequence
Structural disorder is vital for proteins' function in diverse biological processes.

A natural amino acid could be a novel treatment for polyglutamine diseases
Researchers from Osaka University, National Center of Neurology and Psychiatry, and Niigata University identified the amino acid arginine as a potential disease-modifying drug for polyglutamine diseases, including familial spinocerebellar ataxia and Huntington disease.

Alzheimer's: Can an amino acid help to restore memories?
Scientists at the Laboratoire des Maladies Neurodégénératives (CNRS/CEA/Université Paris-Saclay) and the Neurocentre Magendie (INSERM/Université de Bordeaux) have just shown that a metabolic pathway plays a determining role in Alzheimer's disease's memory problems.

New study indicates amino acid may be useful in treating ALS
A naturally occurring amino acid is gaining attention as a possible treatment for ALS following a new study published in the Journal of Neuropathology & Experimental Neurology.

Breaking up amino acids with radiation
A new experimental and theoretical study published in EPJ D has shown how the ions formed when electrons collide with one amino acid, glutamine, differ according to the energy of the colliding electrons.

To make amino acids, just add electricity
By finding the right combination of abundantly available starting materials and catalyst, Kyushu University researchers were able to synthesize amino acids with high efficiency through a reaction driven by electricity.

Nanopores can identify the amino acids in proteins, the first step to sequencing
While DNA sequencing is a useful tool for determining what's going on in a cell or a person's body, it only tells part of the story.

Read More: Amino Acids News and Amino Acids Current Events
Brightsurf.com is a participant in the Amazon Services LLC Associates Program, an affiliate advertising program designed to provide a means for sites to earn advertising fees by advertising and linking to Amazon.com.