A large real-world clinical trial has found that a generative AI-powered support tool used to support frontline clinicians was safe and improved the quality of clinical decision-making but did not significantly change short-term patient outcomes.
The study, published today in Nature Medicine is one of the first randomised controlled trials worldwide to test whether generative AI can improve patient-level outcomes, rather than just clinician performance or simulated cases.
The trial involved more than 9,600 patients attending 16 primary care clinics in Kenya, and was delivered by experts at the University of Birmingham supported by the National Institute for Health and Care Research (NIHR) Biomedical Research Centre: Birmingham.
Clinicians were randomly assigned to use an electronic medical record system with or without an integrated AI consult tool that provided real-time diagnostic and treatment suggestions. The AI system, known as ‘AI Consult’, was a large language model–based clinical decision support tool embedded directly within the existing electronic medical record system.
During consultations, the tool worked in the background by:
Clinicians retained full autonomy; they were not required to follow the AI’s advice, and retained responsibility for all diagnosis, prescribing and referral decisions. The AI interface was not visible to patients, helping preserve normal patient–clinician interaction.
Senior author Professor Bilal Mateen, Honorary Professor of Machine Learning for Health at the University of Birmingham, and Chief AI Officer at PATH, said: “This is one of the first studies to rigorously ask the hardest question about AI in healthcare: whether it actually improves outcomes for patients.
“What we found is reassuring but also sobering. The technology appears safe and clearly improves aspects of clinical decision-making, but translating those gains into measurable patient benefit is much more challenging, particularly in everyday primary care.”
Serious outcomes such as hospitalisation or death are rare in primary care, meaning extremely large studies – potentially involving more than 100,000 patients – would be needed to detect modest effects.
Professor Alastair Denniston, co-author, Professor of Regulatory Science and Innovation at the University of Birmingham and lead for health data research at the NIHR Biomedical Research Centre: Birmingham, said: “A large part of primary care is to deal with common conditions, including those that are self-limiting, where many patients require low levels of healthcare intervention. In that context, even meaningful improvements in clinical reasoning may only result in small changes in patient outcomes that are very difficult to measure.
“What this study shows is that AI can be integrated safely into real clinical workflows, without undermining patient trust or clinician autonomy – which is a critical foundation for any future impact.”
Findings: safety, quality and costs
Researchers found no statistically significant difference in treatment failure within 14 days between patients seen with AI-supported care and those receiving standard care (2.2% vs 2.0%). The study found no evidence of harm, with similar rates of hospitalisation and death in both groups.
While the AI tool did not produce measurable improvements in short-term patient outcomes, it significantly improved the quality of clinical documentation and treatment planning, as assessed by an independent panel of experienced clinicians who were blinded to whether AI had been used.
Patient satisfaction was the same in both groups, suggesting that AI support did not alter patients’ experience of care.
The study also found that, although overall antibiotic prescribing rates were similar, antibiotic‑related costs were lower in the AI‑supported group, due to more cost-conscious prescribing choices.
Although the trial was conducted in Kenya, the researchers emphasise that the findings have global relevance, including for high-income health systems.
Professor Richard Riley, Professor of Biostatistics at the University of Birmingham and senior author, said: “Robust trials like this are so important to establish the real impact of using AI in practice. They help set realistic expectations of what AI can actually contribute within existing care pathways, and helps guide where future investment and research effort should be focused. Generalisability of our findings to higher-income settings, where baseline standards of care are already high, needs to be evaluated.”
The study was funded by the Gates Foundation, sponsored by PATH, and conducted with collaborators from the London School of Hygiene and Tropical Medicine and the KEMRI-Wellcome Trust Research Programme, Kenya.
Nature Medicine
Randomized controlled/clinical trial
People
Generative AI-enabled clinical decision support system in primary care: a pragmatic, cluster randomized trial
26-Jun-2026