Large language models (LLMs) have emerged as transformative tools in healthcare, offering potential value in oncology for information retrieval, clinical decision support, and patient communication. However, the dynamic nature of oncological knowledge—including evolving treatment guidelines and diagnostic standards—raises questions about how LLMs’ performance holds up over time, especially as these models are relied on for increasingly nuanced clinical tasks.
This study, conducted in adherence to PRISMA guidelines, systematically collected relevant literature through 2025 from PubMed, Google Scholar, and Web of Science databases. The research focused on three prominent LLMs: ChatGPT-3.5, ChatGPT-4, and Gemini. Researchers analyzed 614 oncology questions spanning common malignancies (e.g., lung, breast, colorectal cancer) and rare tumors (e.g., glioma, multiple myeloma), using both original study scoring criteria and a standardized five-point Likert scale to assess response accuracy.
Key findings reveal clear divergent temporal trends across the models:
Subjective questions—those requiring complex analysis, integration of clinical context, and nuanced judgment—were far more susceptible to temporal performance degradation than objective, fact-based queries. This disparity highlights the unique challenges LLMs face in applying evolving clinical knowledge to real-world oncology scenarios, where flexibility and alignment with the latest standards are critical.
The study’s results provide vital guidance for the responsible deployment of LLMs in oncology. As healthcare systems increasingly adopt these AI tools to support patient care and clinical decision-making, ongoing performance monitoring, standardized evaluation protocols, and strategies to integrate up-to-date clinical data will be essential to ensure safety and reliability.
Journal of Translational Medicine
Meta-analysis
People
Temporal Evolution of Large Language Models (LLMs) in Oncology
4-Nov-2025
The authors declare that they have no competing interests.