Accepted for/Published in: Journal of Medical Internet Research
Date Submitted: Nov 15, 2023
Date Accepted: Apr 29, 2024
Assessing Generative Pretrained Transformers (GPT) in Clinical Decision-Making: A Comparative Analysis of GPT-3.5 and GPT-4
ABSTRACT
Background:
Artificial Intelligence (AI), particularly chatbot systems, is becoming an instrumental tool in healthcare, aiding clinical decision-making and patient engagement
Objective:
To analyze the performance of Chat GPT-3.5 and Chat GPT-4 in addressing complex clinical and ethical dilemmas, and to illustrate their potential role in healthcare decision-making
Methods:
Four specialized physicians formulated 176 real-world clinical questions. Both senior physicians and residents evaluated the answers generated by GPT-3.5 and GPT-4 on 1-5 scale in 5 categories: accuracy, relevance, clarity, beneficial, Completeness.
Results:
Both GPT models received high scores ( 4.4 ± 0.8 for GPT-4 ,4.1 ± 1.0 for GPT-3.5 ).GPT-4 outperformed GPT-3.5 across all rating dimensions, with seniors consistently rating responses higher than residents for both models. Specifically, seniors rated GPT-4 as more beneficial and complete (4.6 vs 4.0 and 4.6 vs 4.1, respectively, p<0.001), and GPT-3.5 similarly (4.1 vs 3.7 and 3.9 vs 3.5, p<0.001). Ethical queries received the highest ratings for both models, with mean scores reflecting consistency across accuracy and completeness criteria. Distinctions among question types were significant, particularly for GPT-4's completeness across emergency, internal, and ethical questions (4.2 ± 1.0, 4.3 ± 0.8, 4.5 ± 0.7; p < 0.001), and for GPT-3.5's accuracy, beneficial, and completeness dimensions
Conclusions:
Chat GPT's potential to assist physicians with medical issues is promising, with prospects to enhance diagnostics, treatments, and ethics. While integration into clinical workflows may be valuable, it must complement, not replace, human expertise. Continued research is essential to ensure safe and effective implementation in clinical environments. Clinical Trial: N/A
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.