Accepted for/Published in: JMIR Medical Education
Date Submitted: Jan 26, 2024
Date Accepted: Oct 7, 2024
Evaluating AI Competence in Specialized Medicine: A Comparative Analysis of ChatGPT and Neurologists in a Neurology Specialist exam in Spain
ABSTRACT
Background:
With the rapid advancement of artificial intelligence (AI) in various fields, evaluating its application in specialized medical contexts becomes crucial. ChatGPT, a large language model developed by OpenAI, has shown potential in diverse applications, including medicine.
Objective:
This study aims to compare the performance of ChatGPT with that of attending neurologists in a real neurology specialist examination conducted in the Valencian Community, Spain, to assess the AI's capabilities and limitations in medical knowledge.
Methods:
We conducted a comparative analysis using the 2022 neurology specialist exam results from 120 neurologists and responses generated by ChatGPT versions 3.5 and 4. The exam consisted of 80 multiple-choice questions, with a focus on clinical neurology and health legislation. Questions were classified according to Bloom's Taxonomy. Statistical analysis of performance, including Kappa coefficient for response consistency, was performed.
Results:
Human participants exhibited a median score of 5.91, with 32 neurologists failing to pass. ChatGPT-3.5 ranked 116th out of 122, answering 54.5% of questions correctly (score 3.94). ChatGPT-4 showed marked improvement, ranking 17th with 81.8% of correct answers (score 7.57), surpassing several human specialists. No significant variations were observed in the performance on lower-order versus higher-order questions. Additionally, ChatGPT-4 demonstrated increased inter-rater reliability, as reflected by a higher Kappa coefficient of 0.73, compared to ChatGPT-3.5's coefficient of 0.69.
Conclusions:
This study underscores the evolving capabilities of AI in medical knowledge assessment, particularly in specialized fields. ChatGPT4's performance, surpassing the median human score in a rigorous neurology exam, marks a notable advancement, suggesting its potential as an effective tool in specialized medical education and assessment.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.