LLM treatment advice agrees with physician recommendations in early-stage HCC, but falls short in late stage

Large language models (LLM) can generate treatment recommendations for straightforward cases of hepatocellular carcinoma (HCC) that align with clinical guidelines but fall short in more complex cases, according to a new study by Ji Won Han from The Catholic University of Korea and colleagues publishing January 13 ^th in the open-access journal PLOS Medicine .

Choosing the most appropriate treatment for patients with liver cancer is complicated. While international treatment guidelines provide recommendations, clinicians must tailor their treatment choice based on cancer stage and liver function as well as other factors such as comorbidities.

To assess whether LLMs can provide treatment recommendations for hepatocellular carcinoma (HCC) that reflect real-world clinical practice, researchers compared suggestions generated by three LLMs (ChatGPT, Gemini, and Claude) with actual treatments received by more than 13,000 newly diagnosed patients with HCC in South Korea.

They found that, in patients with early-stage HCC, higher agreement between LLM recommendations and actual treatments was associated with improved survival. The inverse was seen in patients with advanced-stage disease. Higher agreement between LLM treatment recommendations and actual practice was associated with worse survival. LLMs placed greater emphasis on tumor factors, such as tumor size and number of tumors, while physicians prioritized liver function.

Overall, the findings suggest that LLMs may help to support straightforward treatment decisions, particularly in early-stage disease, but are not presently suitable for guiding care decisions for more complex cases that require nuanced clinical judgment. Regardless of stage, LLM advice should be used with caution and considered as a supplement to clinical expertise.

The authors add, “Our study shows that large language models can help support treatment decisions for early-stage liver cancer, but their performance is more limited in advanced disease. This highlights the importance of using LLMs as a complement to, rather than a replacement for, clinical expertise.”

In your coverage, please use this URL to provide access to the freely available paper in PLOS Medicine : https://plos.io/48VHQcm

Citation: Yang K, Lee J, Jang JW, Sung PS, Han JW (2026) Evaluating the clinical utility of large language models for hepatocellular carcinoma treatment recommendations: A nationwide retrospective registry study. PLoS Med 23(1): e1004855. https://doi.org/10.1371/journal.pmed.1004855

Author countries : Republic of Korea

Funding: This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (Ministry of Science and ICT) (RS-2025-23525359 to J.W.H.) funded by the Ministry of Health & Welfare, Republic of Korea.

PLOS Medicine

10.1371/journal.pmed.1004855

Computational simulation/modeling

Not applicable

Competing interests: The authors have declared that no competing interests exist.

LLM treatment advice agrees with physician recommendations in early-stage HCC, but falls short in late stage

Anker Laptop Power Bank 25,000mAh (Triple 100W USB-C)

Keywords

Article Information

Contact Information

How to Cite This Article