Bluesky Facebook Reddit Email

Columbia-led team develops open-source framework to accelerate health AI research

05.29.26 | Columbia University Irving Medical Center

Celestron NexStar 8SE Computerized Telescope

Celestron NexStar 8SE Computerized Telescope combines portable Schmidt-Cassegrain optics with GoTo pointing for outreach nights and field campaigns.

NEW YORK, NY -- A research team led by Columbia University has developed an open-source framework designed to streamline and accelerate artificial intelligence research using health data, addressing longstanding challenges in data standardization, reproducibility, and collaboration across institutions.

The framework, called MEDS, introduces both a standardized data format and a growing ecosystem of interoperable tools intended to support the development and evaluation of machine learning models using clinical data.

A study describing the framework was published in NEJM AI.

The researchers say the framework could help reduce technical barriers that currently slow health AI research and make it difficult for scientists to reproduce findings or compare models across studies and institutions.

“MEDS is a simple way to make all different sources of electronic health record (EHR) data look the same to your code, regardless of what hospital or clinic or EHR software system the data came from,” says Matthew McDermott, PhD, assistant professor of biomedical informatics at Columbia University and study leader. “MEDS lets us share code that we can use to train models on many different sites of care without needing to share sensitive patient data — and often without needing to even do the more challenging step of fully ‘harmonizing’ the data into a consistent clinical vocabulary. This infrastructure will allow researchers to spend less time rebuilding pipelines and more time answering clinically meaningful questions.”

Standardizing health data for clinical AI research

Electronic health record data are often stored in institution-specific formats that require extensive preprocessing before they can be used for AI development. According to the study authors, these inconsistencies can create significant duplication of effort, limit collaboration, and hinder reproducibility.

MEDS addresses these issues by providing a lightweight, extensible standard for representing longitudinal clinical data in machine learning workflows. The framework also includes open-source tooling that supports data transformation, preprocessing, benchmarking, and model development.

The authors emphasize that MEDS was designed specifically for AI and machine learning applications, complementing rather than replacing existing clinical data standards.

The framework is intended to support a broad range of use cases in biomedical AI research, including predictive modeling, representation learning, multimodal modeling, and large-scale benchmarking studies. Because the ecosystem is open source, researchers across academia, healthcare, and industry can contribute tools and extensions.

“The big successes in AI have always been driven by the community coming together and being able to collaborate, often in a decentralized, open-source manner, on tools, model parts, and ultimately ecosystems that let us build larger models that scale to massive datasets,” McDermott said. “These impressive results in MEDS are just reflecting the benefits you get when the community can share tools or abstract common parts of their pipelines out into a shared library and use them across everyone's data.”

The study also highlights the importance of reproducibility and transparency in health AI development as machine learning models increasingly move toward clinical deployment.

The researchers say they hope MEDS will foster broader collaboration across institutions and accelerate innovation in clinical AI while promoting more transparent and reproducible science. Already, MEDS has been adopted across 21 institutions spanning 12 countries.

###

Columbia University Irving Medical Center (CUIMC) is a clinical, research, and educational campus located in New York City. Founded in 1928, CUIMC was one of the first academic medical centers established in the United States of America. CUIMC is home to four professional colleges and schools that provide global leadership in scientific research, health and medical education, and patient care including the Vagelos College of Physicians and Surgeons, the Mailman School of Public Health, the College of Dental Medicine, the School of Nursing. For more information, please visit cuimc.columbia.edu .

NEJM AI

10.1056/AIra2501253

Computational simulation/modeling

MEDS — An Emerging Data Standard and Ecosystem for Health AI Research

28-May-2026

No disclosures for Matthew McDermott.

Keywords

Article Information

Contact Information

Helen Garey
Columbia University Irving Medical Center
media@cumc.columbia.edu

How to Cite This Article

APA:
Columbia University Irving Medical Center. (2026, May 29). Columbia-led team develops open-source framework to accelerate health AI research. Brightsurf News. https://www.brightsurf.com/news/L59N7PV8/columbia-led-team-develops-open-source-framework-to-accelerate-health-ai-research.html
MLA:
"Columbia-led team develops open-source framework to accelerate health AI research." Brightsurf News, May. 29 2026, https://www.brightsurf.com/news/L59N7PV8/columbia-led-team-develops-open-source-framework-to-accelerate-health-ai-research.html.