The AI-based program AlphaFold predicts a protein’s 3D structure with remarkable accuracy. However, it tends to reduce heterogeneous structures to a single dominant conformation, or shape, and overlooks experimental conditions that can alter local structure. Researchers at the Institute of Science and Technology Austria (ISTA) and international collaborators have now developed a way to guide AlphaFold with experimental data. Their approach, published in Nature Biotechnology , paves the way for improved future predictive models.
Our understanding of molecular structures is considerably influenced by X-ray crystallography—a technique that has served as structural biology’s central pillar for many decades. While crystal structures have made immense contributions to our biological, medical, and pharmaceutical knowledge, they have also encouraged the view that molecular structures are static.
In addition, crystal structures often resolve recognizable structural features in folded proteins, such as alpha helices, beta strands, and beta sheets. In contrast, the more flexible regions—often missing—such as loops, are shown as dashed lines connecting the visible parts of the molecule. The problem: these regions are not mere linkers devoid of biological significance.
AlphaFold —an AI tool that revolutionized structural biology and earned the 2024 Nobel Prize in Chemistry—like other prediction models, was trained on static crystal structures. These structures dominate protein datasets, accounting for around 85% of the Protein Data Bank (PDB) . However, scientists have become increasingly aware that motions throughout a protein’s entire structure are essential for its function.
By guiding AlphaFold with experimental data, an international team of researchers now aims to model ensembles of structures that better reflect physical and biological reality.
“Proteins are highly dynamic molecules,” says Alex Bronstein, professor at the Institute of Science and Technology Austria (ISTA) . “Being able to model this dynamism with experiment‑guided AlphaFold will highlight the functional importance of the motions of proteins.”
Bronstein led this work together with Ailie Marx from Tel-Hai University of Kiryat Shmona and MIGAL – Galilee Research Institute in Israel, ISTA professor Paul Schanda , and Sanketh Vedula , a postdoctoral researcher at Princeton University and the Broad Institute in the United States.
Developing a new graphical language
Advances in structural biology over the past century have led biologists to develop a visual vocabulary for static structures, including terms such as “helix” and “cartoon representation”. According to Bronstein and the team, this visual vocabulary limits our understanding of the dynamic nature of molecular structures. As such, they now aim to develop a new graphical language that conveys structural heterogeneity.
“Often, structural biologists have justified missing molecular densities in their crystallographic models by saying that the protein region in question was ‘flexible’. This one‑size‑fits‑all explanation no longer satisfies us. We want to tackle all the questions that crystallographers couldn’t answer in the past,” Bronstein says.
Tidying up protein databases
When predicting molecular structures from sequences, AlphaFold already incorporates some evolutionary information. For example, it uses structural relationships among residues that have co‑evolved. However, it applies this information rigidly across different proteins without accounting for the context of each protein’s overall sequence and structure.
The team’s approach addresses this limitation and reveals the full spectrum of structural information and dynamics that are imprinted into protein sequences at the evolutionary level.
“We are developing our model to capture subtler information about structure and dynamics than today’s leading protein databases, such as the PDB, can represent,” says Advaith Maddipatla, a PhD student in the Bronstein group and the first author of the study.
The goal is to enable experiment-guided AlphaFold to predict how heterogeneous a molecular structure will be, which in turn will generate a dataset for retraining the tool.
“Each newly modeled structure can improve subsequent predictions and bring experiment-guided AlphaFold closer to the physical reality than ever before,” Bronstein says. “With this approach, tidying up all structural information in the PDB should be possible within a couple of years.”
When structural ‘fuzziness’ is the desired outcome
Beyond predicting structures from sequences, the team aims to enable experiment-guided AlphaFold to reconstruct plausible structural ensembles from ‘noisy’ experimental data. This approach will help retrieve the missing information from these datasets.
“Unlike the static crystallographic structures used to train AlphaFold, the structural ‘blur’ is precisely the signal we are looking for,” says Bronstein.
The ISTA researchers are developing their tool to predict detailed insights into the protein’s dynamics and structural heterogeneity. In addition, they aim to determine how frequently such flexibility occurs in nature.
“Predicting this natural ‘fuzziness’ of proteins from sequences and noisy data could eventually boost the modeling of large protein complexes and advance inverse protein design,” Bronstein says. Crucial in bioengineering and drug discovery, inverse protein design—also known as inverse folding—is the process of designing sequences that fold into a specific 3D structure.
The scientists argue that the future of structural design must expand to designing ensembles of structures as a function of time, showing how structures change over milliseconds. “Some proteins go very quickly through functionally important conformations,” says Schanda. “We want our model to be able to visit all these conformations.”
Paving the way for ‘experimentally aware’ predictive models
An earlier version of this study , now published in Nature Biotechnology , was first presented at the 2025 International Conference on Machine Learning (ICML) . This year, the team’s follow-up study , which focuses on optimizing the model’s inference time, has been accepted to ICML 2026. By introducing a general optimization framework for inference time, this second study addresses limitations in the reverse‑diffusion process, the technique used in generative machine learning models to reconstruct data from noise.
In a third study , posted on the preprint server bioRxiv , the researchers applied their experiment‑guided AlphaFold model to uncover previously unmodeled conformations of the protein β2 -microglobulin from crystallographic data. “This study clearly demonstrates that experiment-guided AlphaFold can uncover conformations that conventional structural refinement workflows miss,” says Maddipatla, the first author of all three studies.
“Experiment-guided AlphaFold paves the way for future predictive models that are ‘experimentally aware’ and capture the ensemble nature of protein structures. Ultimately, we want to turn our model from a proof of concept into a standard tool for biological research by integrating it into structural prediction frameworks,” he concludes.
Nature Biotechnology
Experimental study
Not applicable
Experiment-guided AlphaFold3 resolves measurement-consistent protein ensembles.
29-Jun-2026