Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR Medical Education

Date Submitted: Sep 29, 2023
Open Peer Review Period: Sep 16, 2023 - Nov 11, 2023
Date Accepted: Feb 26, 2024
(closed for review but you can still tweet)

The final, peer-reviewed published version of this preprint can be found here:

Appraisal of ChatGPT’s Aptitude for Medical Education: Comparative Analysis With Third-Year Medical Students in a Pulmonology Examination

Cherif H, Moussa C, Missaoui AM, Mokaddem S, Dhahri B

Appraisal of ChatGPT’s Aptitude for Medical Education: Comparative Analysis With Third-Year Medical Students in a Pulmonology Examination

JMIR Med Educ 2024;10:e52818

DOI: 10.2196/52818

PMID: 39042876

PMCID: 11303904

An Appraisal of ChatGPT's Aptitude for Medical Education: A Comparative Analysis with Human Medical Students in a Pulmonology Examination

  • Hela Cherif; 
  • Chirine Moussa; 
  • Abdel Mouhaymen Missaoui; 
  • Salma Mokaddem; 
  • Besma Dhahri

ABSTRACT

Background:

The rapid evolution of ChatGPT has generated substantial interest and triggered extensive discussions in both public and academic spheres, notably within the realm of medical education.

Objective:

Our objective is to evaluate ChatGPT's performance in a pneumology examination and compare it to that of human medical students.

Methods:

In this cross-sectional study, we conducted a comparative analysis with two distinct groups. The first group comprised 244 third-year medical students who had previously taken our institution's 2020 pulmonology examination, conducted in French. The second group involved two variants of ChatGPT-3.5: ChatGPT-V1 (lacking contextualization) and ChatGPT-V2 (enhanced with contextual information). Both ChatGPT versions received the same set of questions administered to the students.

Results:

ChatGPT-V1 demonstrated exceptional proficiency in radiology, microbiology, and thoracic surgery, surpassing the majority of medical students in these domains. However, it faced substantial challenges in pathology, pharmacology, and clinical pneumology. In contrast, ChatGPT-V2 consistently delivered more accurate responses across diverse question categories, regardless of specialization. ChatGPT exhibited suboptimal performance in multiple-choice questions compared to human candidates. Notably, ChatGPT-V2 excelled in responding to structured open-ended questions. Both versions, particularly ChatGPT-V2, outperformed students in addressing questions of low and intermediate difficulty. Interestingly, students showcased enhanced proficiency when confronted with highly challenging questions. ChatGPT-V1 fell short of passing the examination, yet its score surpassed that of 40.6% of students. Conversely, ChatGPT-V2 successfully achieved examination success, outperforming 62.1% of human candidates.

Conclusions:

Despite its access to a comprehensive online dataset, ChatGPT's performance aligns closely with that of an average medical student. Its outcomes are intricately influenced by question format, item complexity, and contextual nuances. It encounters difficulties in medical contexts that necessitate information synthesis, advanced analytical aptitude, and refined clinical judgment. Furthermore, its efficiency noticeably diminishes when presented with non-English language assessments and data diverging from mainstream internet sources.


 Citation

Please cite as:

Cherif H, Moussa C, Missaoui AM, Mokaddem S, Dhahri B

Appraisal of ChatGPT’s Aptitude for Medical Education: Comparative Analysis With Third-Year Medical Students in a Pulmonology Examination

JMIR Med Educ 2024;10:e52818

DOI: 10.2196/52818

PMID: 39042876

PMCID: 11303904

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.