The Lancet Digital Health: First systematic review and meta-analysis suggests artificial intelligence may be as effective as health professionals at diagnosing disease

September 24, 2019

Artificial intelligence (AI) appears to detect diseases from medical imaging with similar levels of accuracy as health-care professionals, according to the first systematic review and meta-analysis, synthesising all the available evidence from the scientific literature published in The Lancet Digital Health journal.

Nevertheless, only a few studies were of sufficient quality to be included in the analysis, and the authors caution that the true diagnostic power of the AI technique known as deep learning--the use of algorithms, big data, and computing power to emulate human learning and intelligence--remains uncertain because of the lack of studies that directly compare the performance of humans and machines, or that validate AI's performance in real clinical environments.

"We reviewed over 20,500 articles, but less than 1% of these were sufficiently robust in their design and reporting that independent reviewers had high confidence in their claims. What's more, only 25 studies validated the AI models externally (using medical images from a different population), and just 14 studies actually compared the performance of AI and health professionals using the same test sample," explains Professor Alastair Denniston from University Hospitals Birmingham NHS Foundation Trust, UK, who led the research. [1]

"Within those handful of high-quality studies, we found that deep learning could indeed detect diseases ranging from cancers to eye diseases as accurately as health professionals. But it's important to note that AI did not substantially out-perform human diagnosis." [1]

With deep learning, computers can examine thousands of medical images to identify patterns of disease. This offers enormous potential for improving the accuracy and speed of diagnosis. Reports of deep learning models outperforming humans in diagnostic testing has generated much excitement and debate, and more than 30 AI algorithms for healthcare have already been approved by the US Food and Drug Administration.

Despite strong public interest and market forces driving the rapid development of these technologies, concerns have been raised about whether study designs are biased in favour of machine learning, and the degree to which the findings are applicable to real-world clinical practice.

To provide more evidence, researchers conducted a systematic review and meta-analysis of all studies comparing the performance of deep learning models and health professionals in detecting diseases from medical imaging published between January 2012 and June 2019. They also evaluated study design, reporting, and clinical value.

In total, 82 articles were included in the systematic review. Data were analysed for 69 articles which contained enough data to calculate test performance accurately. Pooled estimates from 25 articles that validated the results in an independent subset of images were included in the meta-analysis.

Analysis of data from 14 studies comparing the performance of deep learning with humans in the same sample found that at best, deep learning algorithms can correctly detect disease in 87% of cases, compared to 86% achieved by health-care professionals.

The ability to accurately exclude patients who don't have disease was also similar for deep learning algorithms (93% specificity) compared to health-care professionals (91%).

Importantly, the authors note several limitations in the methodology and reporting of AI-diagnostic studies included in the analysis. Deep learning was frequently assessed in isolation in a way that does not reflect clinical practice. For example, only four studies provided health professionals with additional clinical information that they would normally use to make a diagnosis in clinical practice. Additionally, few prospective studies were done in real clinical environments, and the authors say that to determine diagnostic accuracy requires high-quality comparisons in patients, not just datasets. Poor reporting was also common, with most studies not reporting missing data, which limits the conclusions that can be drawn.

"There is an inherent tension between the desire to use new, potentially life-saving diagnostics and the imperative to develop high-quality evidence in a way that can benefit patients and health systems in clinical practice," says Dr Xiaoxuan Liu from the University of Birmingham, UK. "A key lesson from our work is that in AI--as with any other part of healthcare--good study design matters. Without it, you can easily introduce bias which skews your results. These biases can lead to exaggerated claims of good performance for AI tools which do not translate into the real world. Good design and reporting of these studies is a key part of ensuring that the AI interventions that come through to patients are safe and effective." [1]

"Evidence on how AI algorithms will change patient outcomes needs to come from comparisons with alternative diagnostic tests in randomised controlled trials," adds Dr Livia Faes from Moorfields Eye Hospital, London. "So far, there are hardly any such trials where diagnostic decisions made by an AI algorithm are acted upon to see what then happens to outcomes which really matter to patients, like timely treatment, time to discharge from hospital, or even survival rates." [1]

Writing in a linked Comment, Dr Tessa Cook from the University of Pennsylvania, USA, discusses whether AI can be effectively compared to the human physician working in the real world, where data are "messy, elusive, and imperfect". She writes: "Perhaps the better conclusion is that, the narrow public body of work comparing AI to human physicians, AI is no worse than humans, but the data are sparse and it may be too soon to tell."
-end-
NOTES TO EDITORS

This study received no funding. It was conducted by researchers from University Hospitals Birmingham NHS

Foundation Trust, Birmingham, UK; University of Birmingham, Birmingham, UK; Moorfields Eye Hospital NHS

Foundation Trust, London, UK; Cantonal Hospital of Lucerne, Lucerne, Switzerland; NIHR Biomedical Research Centre for Ophthalmology, Moorfields Eye Hospital NHS Foundation Trust and UCL Institute of Ophthalmology, London, UK; Ludwig Maximilian University of Munich, Munich, Germany; DeepMind, London, UK; Scripps Research Translational Institute, La Jolla, California; and Medignition, Zurich, Switzerland.

[1] Quote direct from author and cannot be found in the text of the Article.

The labels have been added to this press release as part of a project run by the Academy of Medical Sciences seeking to improve the communication of evidence. For more information, please see: http://www.sciencemediacentre.org/wp-content/uploads/2018/01/AMS-press-release-labelling-system-GUIDANCE.pdf if you have any questions or feedback, please contact The Lancet press office pressoffice@lancet.com

Peer-reviewed / Systematic review and meta-analysis / People

The Lancet

Related Medical Imaging Articles from Brightsurf:

Improved medical imaging improves cancer staging
Prof. TIAN Chao's group improved the imaging quality and 3D construction of the photoacoustic imaging, and applied them to in vivo sentinel lymph node imaging.

AI techniques in medical imaging may lead to incorrect diagnoses
Machine learning and AI are highly unstable in medical image reconstruction, and may lead to false positives and false negatives, a new study suggests.

Tiny devices promise new horizon for security screening and medical imaging
Miniature devices that could be developed into safe, high-resolution imaging technology, with uses such as helping doctors identify potentially deadly cancers and treat them early, have been created in research involving the University of Strathclyde.

Advanced medical imaging combined with genomic analysis could help treat cancer patients
Melding the genetic and cellular analysis of tumors with how they appear in medical images could give physicians new insights into how to best treat patients, especially those with brain cancer, according to a new study led by TGen.

Low doses of radiation used in medical imaging lead to mutations in cell cultures
Common medical imaging procedures use low doses of radiation that are believed to be safe.

Use of medical imaging
This observational study looked at patterns of use for computed tomography (CT), magnetic resonance imaging (MRI), ultrasound and nuclear medicine imaging in the United States and in Ontario, Canada, from 2000 to 2016.

Medical imaging rates continue to rise despite push to reduce their use
The rates of use of CT, MRI and other scans have continued to increase in both the US and Ontario, Canada, according to a new study of more than 135 million imaging exams conducted by researchers at UC Davis, UC San Francisco and Kaiser Permanente.

Two-in-one contrast agent for medical imaging
Magnetic resonance imaging (MRI) visualizes internal body structures, often with the help of contrast agents to enhance sensitivity.

Medical imaging rates during pregnancy
Researchers looked at rates of medical imaging (CT, MRI, conventional x-rays, angiography, fluoroscopy and nuclear medicine) during pregnancy in this observational study that included nearly 3.5 million pregnant women in the United States and Canada from 1996 to 2016.

Scientists discover new method for developing tracers used for medical imaging
University of North Carolina researchers discovered a method for creating radioactive tracers to better track pharmaceuticals in the body as well as image diseases, such as cancer, and other medical conditions.

Read More: Medical Imaging News and Medical Imaging Current Events
Brightsurf.com is a participant in the Amazon Services LLC Associates Program, an affiliate advertising program designed to provide a means for sites to earn advertising fees by advertising and linking to Amazon.com.