Bluesky Facebook Reddit Email

AI language models sharpen chest CT diagnoses, speeding surgical decisions

07.30.25 | FAR Publishing Limited

Rigol DP832 Triple-Output Bench Power Supply

Rigol DP832 Triple-Output Bench Power Supply powers sensors, microcontrollers, and test circuits with programmable rails and stable outputs.

Interpreting the fine print of a chest CT report can make or break a patient’s surgical plan, yet radiologists worldwide face ballooning workloads and widening expertise gaps. A new study from Zhujiang Hospital of Southern Medical University analyzed 13,489 real-world chest CT reports and found that state-of-the-art LLMs can shoulder much of that burden—when asked the right way.

''We discovered that modern language models can act as a dependable second set of eyes for radiologists,'' said Dr. Peng Luo, lead author and physician at Zhujiang Hospital. ''With carefully worded multiple-choice prompts, GPT-4 reached a 75 percent accuracy rate across 13 common chest diseases, ranging from COPD to aortic atherosclerosis.''

The team compared GPT-4, Claude-3.5-Sonnet, Qwen-Max, Gemini-Pro and GPT-3.5-Turbo using two question styles: open-ended and multiple choice. Across all models, multiple-choice prompts boosted accuracy and consistency, underscoring the power of prompt engineering. GPT-4, Claude-3.5 and Qwen-Max topped the charts, while GPT-3.5-Turbo and Gemini-Pro lagged.

To probe whether weaker models could catch up, the researchers fine-tuned GPT-3.5-Turbo on 200 high-performing cases. ''Fine-tuning turned a 42 percent system into a 65 percent system overnight for tough pulmonary cases,'' Dr. Luo said. ''That's a game-changer for hospitals that rely on cost-effective models.”

Beyond raw accuracy, the study evaluated each model’s area under the ROC curve (AUC) for every disease. GPT-4 excelled at gallstone and pleural effusion detection, while Qwen-Max showed unusual strength in COPD discrimination. However, no single model dominated every condition, suggesting a tailored, disease-specific deployment strategy.

The authors caution that LLM outputs still require expert oversight, especially when a model expresses high confidence in borderline cases. Future work will integrate explainable-AI tools to reveal how models weigh radiologic clues and to set dynamic confidence thresholds.

International Journal of Surgery

10.1097/JS9.0000000000002582

Performance analysis of large language models in multi-disease detection from chest computed tomography reports: a comparative study.

5-Jun-2025

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Keywords

Article Information

Contact Information

Chris Zhou
FAR Publishing Limited
editorial@fargroups.com

Source

How to Cite This Article

APA:
FAR Publishing Limited. (2025, July 30). AI language models sharpen chest CT diagnoses, speeding surgical decisions. Brightsurf News. https://www.brightsurf.com/news/1GR5R9X8/ai-language-models-sharpen-chest-ct-diagnoses-speeding-surgical-decisions.html
MLA:
"AI language models sharpen chest CT diagnoses, speeding surgical decisions." Brightsurf News, Jul. 30 2025, https://www.brightsurf.com/news/1GR5R9X8/ai-language-models-sharpen-chest-ct-diagnoses-speeding-surgical-decisions.html.