Bluesky Facebook Reddit Email

Millions of protein complexes added to AlphaFold Database shed light on how proteins interact

03.17.26 | European Molecular Biology Laboratory

SAMSUNG T9 Portable SSD 2TB

SAMSUNG T9 Portable SSD 2TB transfers large imagery and model outputs quickly between field laptops, lab workstations, and secure archives.


A new collaboration between EMBL’s European Bioinformatics Institute (EMBL-EBI), Google DeepMind, NVIDIA, and Seoul National University has made millions of AI-predicted protein complex structures openly available through the AlphaFold Database . To maximise global health impact, the dataset prioritises proteins important for understanding human health and disease. This is the largest dataset of protein complex predictions currently available.

Proteins are the building blocks of life. They interact to create protein complexes which fulfil biological functions. By visualising protein interactions, scientists can uncover the molecular mechanisms that drive cell behaviour, identify what goes wrong when someone gets sick, and develop new drugs and therapies. Predicting the structure of protein complexes is extremely challenging because, in nature, proteins change shape and interact in many different ways.

“Science thrives on collaboration,” said Jo McEntyre, Interim Director of EMBL-EBI . “By making this foundational protein complex dataset openly available to the world, we’re inviting researchers to test, refine, and build on it to drive the next wave of biological discoveries.”

The latest AlphaFold Database update spans millions of homodimers – protein complexes formed of two identical proteins. It focuses on 20 of the most studied species, including humans, as well as the World Health Organization’s bacterial priority pathogens list . This approach aims to bring significant and immediate value for global health challenges.

“By expanding the AlphaFold Database to include protein complexes, we are addressing a critical need expressed by the scientific community,” said Anna Koivuniemi, Head of the Google DeepMind Impact Accelerator. “We hope that by lowering the barrier to these complex predictions, we can empower researchers everywhere to pursue the next wave of discoveries that could ultimately improve human health on a global scale.”

The collaboration builds on Google DeepMind’s AI system AlphaFold, which, since 2021, accurately predicted the structure of millions of proteins. To democratise access to AlphaFold predictions, Google DeepMind and EMBL-EBI developed the AlphaFold Database, an open resource that anyone can access. The database has over 3.4 million users from 190 countries.

Through ongoing dialogue with the scientific community, a clear need emerged to expand the AlphaFold database to include protein complexes. In response to this need, EMBL-EBI, Google DeepMind, NVIDIA, and Seoul National University teamed up, contributing specialist expertise and resources, to calculate and integrate millions of protein complexes into the AlphaFold Database.

The collaboration brought together deep biological expertise and technical innovations. NVIDIA and the Steinegger Lab at the Seoul National University developed the methodology, based on Google DeepMind’s AI system AlphaFold, including accelerations to multiple sequence alignment calculations and deep learning inference. NVIDIA provided cutting-edge AI infrastructure and scaled out inference pipelines to overcome limitations that historically made this scale of calculations challenging. EMBL-EBI enabled the collaboration by bringing the other parties together and contributing expertise in scientific and biodata management, as well as analysis. As a champion of open science, EMBL-EBI, together with Google DeepMind, integrated the new dataset into the AlphaFold Database.

"NVIDIA's ambition is to consistently contribute orders-of-magnitude accelerations for fundamental digital biology workloads, enabling what was not possible before,” said Anthony Costa, NVIDIA Director of Digital Biology. “This release is a great example of how AI infrastructure and software can uniquely enable new scales of biological understanding."
“By making predicted protein complexes accessible at an unprecedented scale, we are illuminating an unseen landscape of molecular interactions across the tree of life,” explained Martin Steinegger, Associate Professor at Seoul National University.

It takes a blend of AI-scale infrastructure and deep technical knowledge in accelerating complex workflows to generate AI predictions for protein complexes at this scale. The collaboration is centrally hosting data that would otherwise require around 17 million hours of GPU (graphics processing unit) computing to recreate.

By making these calculations once and adding the information into the AlphaFold Database, this collaboration aims to help democratise access to protein complex predictions. It enables scientists everywhere to investigate how proteins interact in the vast protein universe, and accelerate discoveries that could lead to new medicines, new products, and a deeper understanding of life itself.

This is the first step in an ambition to add a wide range of protein complex structure predictions to the AlphaFold Database. The partnership has already calculated predictions for 30 million complexes. Of these, 1.7 million high-confidence homodimer predictions have been added to the AlphaFold Database. Another 18 million are lower-confidence homodimers, which are available as a list and for bulk download. The rest are heterodimers, currently being analysed and assessed. More protein complex predictions will be calculated and high-confidence predictions will be added to the AlphaFold Database in the coming months. The work is described in more detail in a preprint.

“The human genome has just over 20,000 different proteins. Despite this relatively small genome, human beings display incredibly complex pathways, processes and regulation. Much of this complexity arises from the intermolecular interactions between proteins, and with small molecule ligands and DNA. Adding predicted protein-protein homodimeric interactions to the AlphaFold Database is a first step towards a comprehensive description of the human interactome, the basis by which human biology will be described and understood. This has relevance for the design of new therapeutics, understanding host-pathogen interactions, and more. Making these structures accessible to all, allows every researcher around the world to build on these data, moving one step closer to predicting the biology of life,” said Dame Janet Thornton, Director Emeritus of EMBL-EBI.

EMBL’s European Bioinformatics Institute (EMBL-EBI)

EMBL’s European Bioinformatics Institute (EMBL-EBI) is a global leader in the storage, analysis, and dissemination of large biological datasets. We help scientists realise the potential of big data by enhancing their ability to exploit complex information, enabling responsible AI development, and making scientific outcomes available to the community, to make discoveries that benefit humankind.

We are at the forefront of computational biology research, with work spanning sequence analysis methods, multi-dimensional statistical analysis and data-driven biological discovery, from plant biology to mammalian development and disease.

We are part of the European Molecular Biology Laboratory (EMBL) and are located on the Wellcome Genome Campus, one of the world’s largest concentrations of scientific and technical expertise in genomics.

Google Deepmind

Google DeepMind is a world-leading AI research lab with British heritage and an international team, committed to building AI responsibly, delivering scientific breakthroughs, and creating products that improve billions of lives. The unit’s breakthroughs over the last decade include AlphaGo - the first computer program to defeat a Go world champion, Transformers - neural networks that underpin all modern language models, AlphaFold - an AI model that can accurately predict the structure and interactions of proteins, DNA, RNA, ligands and more, and Gemini, a family of versatile AI models built from the ground up for multimodality, seamlessly combining and understanding text, code, images, audio and video.

Seoul National University

Seoul National University is a leading research university in the Republic of Korea, dedicated to advancing knowledge through education, research, and public service.

With strengths across a broad range of disciplines, the University fosters interdisciplinary collaboration in fields including life sciences, engineering, and data-driven science.

Through global partnerships, Seoul National University contributes to scientific innovation and to addressing challenges that impact society worldwide.

Keywords

Contact Information

Victoria Hatch
European Molecular Biology Laboratory
vhatch@ebi.ac.uk

How to Cite This Article

APA:
European Molecular Biology Laboratory. (2026, March 17). Millions of protein complexes added to AlphaFold Database shed light on how proteins interact. Brightsurf News. https://www.brightsurf.com/news/8J4OGQZL/millions-of-protein-complexes-added-to-alphafold-database-shed-light-on-how-proteins-interact.html
MLA:
"Millions of protein complexes added to AlphaFold Database shed light on how proteins interact." Brightsurf News, Mar. 17 2026, https://www.brightsurf.com/news/8J4OGQZL/millions-of-protein-complexes-added-to-alphafold-database-shed-light-on-how-proteins-interact.html.