Bluesky Facebook Reddit Email

Mapping the molecules of life: expanding the quantum-mechanical foundation for biomolecular AI

04.21.26 | AI for Science

Apple iPhone 17 Pro

Apple iPhone 17 Pro delivers top performance and advanced cameras for field documentation, data collection, and secure research communications.


Machine learning force fields (MLFFs) are rapidly transforming molecular simulations by combining the accuracy of quantum mechanics with the speed of classical approaches. However, training reliable MLFFs for biological systems requires large, chemically diverse datasets computed with high-level quantum-mechanical methods. Until now, such datasets have primarily covered small organic molecules and protein fragments, leaving approximately 40% of cellular biomass, namely nucleic acids, lipids, and carbohydrates, without adequate quantum-mechanical reference data. This gap has limited the ability of AI models to simulate critical biological processes such as DNA dynamics, membrane behavior, and sugar recognition.

The Solution: To address this data gap, the researchers introduced QCell, a curated collection of 525,000 new quantum-mechanical calculations for biomolecular fragments spanning five categories: nucleic acids (DNA duplexes and RNA fragments), lipids (fatty acid clusters and cholesterol-containing assemblies), carbohydrates (disaccharides and glycosidic linkages), solvated ions and water clusters, and non-covalent molecular dimers. Fragments range from 2 to 402 atoms and were computed using the non-empirical hybrid PBE0 density functional with many-body dispersion interactions (PBE0+MBD(–NL)), a level of theory chosen for its accuracy and transferability without reliance on empirical fitting. The team generated structures through extensive conformational sampling using molecular dynamics simulations and conformer-generation tools, followed by careful fragment selection and pre-optimization. When combined with companion datasets (QCML, QM7-X, AQM, GEMS, and SPICE), QCell brings the total pool of consistently computed reference structures to over 41 million, spanning 82 chemical elements. Benchmark training of a state-of-the-art MLFF (SO3LR) on the combined data demonstrated force prediction errors below 1 kcal/mol/Å for most molecular classes, confirming both the internal consistency and practical utility of the dataset.

The Future: Future research will leverage QCell to train general-purpose MLFFs for full-cell biomolecular simulations, including membrane dynamics, nucleic acid folding, and glycan–protein interactions. The consistent quantum-mechanical framework also opens the door to extending coverage to additional biomolecular species, post-translational modifications, and complex multi-component biological environments.

QCell fills a critical gap in the quantum-mechanical data landscape by providing high-quality reference calculations for the three major biomolecular classes beyond proteins: nucleic acids, lipids, and carbohydrates, along with biologically relevant ions and molecular dimers. Structural validation confirmed that the dataset captures the full conformational diversity expected for each molecular class, including canonical DNA helical forms, lipid packing arrangements, glycosidic torsional profiles, and experimentally consistent ion solvation structures. The dataset is freely available on Zenodo and all data-generation scripts are provided in a public GitHub repository.

The Impact: This work provides a foundational resource for developing transferable machine learning force fields that can model the full spectrum of biomolecular interactions at quantum-mechanical accuracy. By enabling reliable simulations of nucleic acid dynamics, membrane processes, and carbohydrate recognition, QCell accelerates progress toward AI-driven molecular simulations of entire biological systems.

The research has been recently published in the online edition of AI for Science , published by IOP Publishing on behalf of the Dongguan Institute of Materials Science and Technology, CAS.

Reference: Adil Kabylda, Sergio Suárez-Dou, Nils Davoine, Florian N Brünig, Alexandre Tkatchenko. QCell: comprehensive quantum-mechanical dataset spanning diverse biomolecular fragments[J]. AI for Science , 2026, 2(2): 025003. DOI: 10.1088/3050-287X/ae5267

AI for Science

10.1088/3050-287X/ae5267

QCell: comprehensive quantum-mechanical dataset spanning diverse biomolecular fragments

17-Apr-2026

Keywords

Article Information

Contact Information

Yan He
Dongguan Institute of Materials Science and Technology, CAS
heyan@dimst.ac.cn

Source

How to Cite This Article

APA:
AI for Science. (2026, April 21). Mapping the molecules of life: expanding the quantum-mechanical foundation for biomolecular AI. Brightsurf News. https://www.brightsurf.com/news/LKNOZNNL/mapping-the-molecules-of-life-expanding-the-quantum-mechanical-foundation-for-biomolecular-ai.html
MLA:
"Mapping the molecules of life: expanding the quantum-mechanical foundation for biomolecular AI." Brightsurf News, Apr. 21 2026, https://www.brightsurf.com/news/LKNOZNNL/mapping-the-molecules-of-life-expanding-the-quantum-mechanical-foundation-for-biomolecular-ai.html.