Clinical interviewing is one of the most important skills physicians develop during their training. It forms the foundation for accurate diagnosis and effective patient care. However, evaluating these skills is often time-intensive, requiring repeated observations and detailed feedback from experienced clinicians. As medical education continues to expand, this growing assessment burden has become a significant challenge. The incorporation of generative artificial intelligence (AI) has the potential to significantly improve the assessment of interviewing skills; however, its efficiency compared to standard evaluation systems is not well understood.
To fill this gap, researchers from Japan explored whether artificial intelligence could help address this issue by evaluating medical interview transcripts. Their findings were published on February 17, 2026 in Volume 12 of the journal JMIR Medical Education . The research team led by Dr. Hiromizu Takahashi (corresponding author) and Professor Toshio Naito, both from the Department of General Medicine, Juntendo University Faculty of Medicine, Japan, examined whether AI-based assessment (ABA) could match traditional human-based assessment (HBA).
“Our central message is that AI may help make medical training fairer, faster, and more scalable,” explains Prof. Naito.
To evaluate ABA vs HBA systems, the researchers designed a cross-sectional validation study using a virtual patient system. Seven participants, including medical students, resident physicians, and attending physicians, conducted clinical interviews with an AI-simulated patient presented with bilateral leg weakness. These conversations were automatically recorded and converted into transcripts. The transcripts were then evaluated using the Master Interview Rating Scale, a standardized tool that assesses various aspects of clinical communication, such as information gathering, organization, and empathy. For the ABA system, AI models, specifically GPT-o1 Pro and GPT-5 Pro, were used to assess the transcripts. On the other hand, five experienced clinical instructors independently evaluated the same transcripts comprising the HBA approach.
According to the researchers, ABA showed strong agreement with clinician evaluations, with only minimal differences in scores. At the same time, AI demonstrated greater consistency across repeated evaluations. Importantly, the use of AI also reduced the time required to assess each transcript by more than half, highlighting its potential to ease the workload of educators. “Rather than replacing teachers, this research suggests a practical ‘AI-first, faculty-verified’ model in which AI handles the first pass and educators focus their time on coaching, judgment, and high-stakes decisions,” says Dr. Takahashi.
These results have important implications for medical education. In many training programs, delays in feedback can limit opportunities for students to improve their communication skills. By providing rapid and consistent evaluations, AI could make repeated practice more accessible, particularly in settings with limited faculty resources. “Students could interview an AI-simulated patient and receive feedback almost immediately instead of waiting days or weeks,” Prof. Naito adds, highlighting the potential for more timely learning experiences.
At the same time, the researchers emphasize that AI should be used with care. While AI performed well in this study, it was based on a small number of participants and a single clinical scenario. In addition, transcript-based evaluation cannot capture nonverbal cues, tone, or cultural nuances that are often important in real-world patient interactions. Prof. Naito and Dr. Takahashi note with caution, “AI should be used with human oversight, because text-only scoring can miss nuances such as tone, nonverbal communication, and cultural context.”
Overall, this study highlights the growing role of AI in medical education. By combining the speed and consistency of AI with the expertise and judgment of clinicians, it may be possible to create more efficient and scalable training systems. As the demand for high-quality medical education continues to rise, such approaches could help ensure that future clinicians receive the best training while reducing the burden on educators.
***
Reference
DOI: 10.2196/81673
Authors: Hiromizu Takahashi 1 , Kiyoshi Shikino 2 , Takeshi Kondo 3,4 , Yuji Yamada 5 , Yoshitaka Tomoda 6 , Minoru Kishi 7 , Yuki Aiyama 8 , Sho Nagai 9 , Akiko Enomoto 9 , Yoshinori Tokushima 10 , Takahiro Shinohara 11 , Fumiaki Sano 1 , Takeshi Matsuura 12 , Rikiya Watanabe 13 , and Toshio Naito 1
Affiliations
1 Department of General Medicine, Faculty of Medicine, Juntendo University, Tokyo, Japan
2 Department of Community-Oriented Medical Education, Graduate School of Medicine, Chiba University, Chiba, Japan
3 Center for Postgraduate Clinical Training and Career Development, Nagoya University Hospital, Nagoya, Japan
4 The School of Health Professions Education, Maastricht University, Maastricht, The Netherlands
5 Brookdale Department of Geriatrics and Palliative Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, United States 6 Department of General Internal Medicine, Itabashi Chuo Medical Center, Tokyo, Japan
7 Department of Internal Medicine, Nishiwaki Municipal Hospital, Hyogo, Japan
8 Anesthesiology and Critical Care Medicine, Tenri Hospital, Nara, Japan
9 Department of Nursing, School of Nursing, University of Human Environments, Aichi, Japan
10 Department of General Medicine, Saga University Hospital, Saga, Japan
11 Department of General Medicine, Graduate School of Medical and Dental Sciences, Institute of Science Tokyo, Tokyo, Japan 12 Department of General Medicine, Bibai City Hospital, Hokkaido, Japan
13 Department of General Internal Medicine, Kita-Harima Medical Center, Hyogo, Japan
About Professor Toshio Naito
Dr. Toshio Naito, MD, PhD, MBA, is a Professor in the Department of General Medicine at Juntendo University Faculty of Medicine, Tokyo, Japan. With over 30 years of clinical and academic experience, his research focuses on general medicine, infectious diseases, HIV, and medical education. He has authored 112 original articles and 4 review articles, achieving an h-index of 23 and 1,799 citations. His contributions have significantly advanced both clinical practice and medical training.
JMIR Medical Education
Observational study
People
AI- vs Human-Based Assessment of Medical Interview Transcripts in a Generative AI–Simulated Patient System: Cross-Sectional Validation Study
17-Feb-2026