Accepted for/Published in: JMIR Medical Education
Date Submitted: Apr 25, 2024
Open Peer Review Period: Apr 25, 2024 - Jun 20, 2024
Date Accepted: Sep 23, 2024
(closed for review but you can still tweet)
Performance Comparison of Junior Residents and ChatGPT in OSCE for Medical History Taking and Chart Writing: A Simulation-Based Evaluation
ABSTRACT
Background:
This study explores the cutting-edge abilities of large language models (LLMs) such as ChatGPT in medical history taking and medical chart documentation, with a focus on their practical effectiveness in clinical settings—an area vital for the progress of medical artificial intelligence.
Objective:
The aim was to assess the capability of ChatGPT versions 3.5 and 4.0 in performing medical history intakes and charting in simulated clinical environments. The study compared the performance of non-medical individuals using ChatGPT with that of junior medical residents.
Methods:
A simulation involving standardized patients was designed to mimic authentic medical history-taking interactions. Five non-medical participants utilized ChatGPT versions 3.5 and 4.0 to conduct medical histories and document charts, mirroring the tasks performed by five junior residents in identical scenarios. A total of ten diverse scenarios were examined.
Results:
Evaluation of the medical documentation created by laypersons with ChatGPT assistance and by junior residents was conducted by two senior emergency physicians, employing audio recordings and the final charts. The assessment used the Objective Structured Clinical Examination (OSCE) benchmarks in Taiwan as a reference. ChatGPT 4.0 exhibited substantial enhancements over its predecessor and met or exceeded the performance of human counterparts in terms of both checklist and global assessment scores. Although the overall quality of human consultations remained higher, ChatGPT 4.0's proficiency in medical documentation was notably promising.
Conclusions:
The performance of ChatGPT 4.0 was on par with human participants in OSCE evaluations, signifying its potential in medical history documentation and chart writing. Despite this, the superiority of human consultations in terms of quality was evident. The study underscores both the promise and the current limitations of LLMs in the realm of clinical practice.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.