Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.
Who will be affected?
Readers: No access to all 28 journals. We recommend accessing our articles via PubMed Central
Authors: No access to the submission form or your user account.
Reviewers: No access to your user account. Please download manuscripts you are reviewing for offline reading before Wednesday, July 01, 2020 at 7:00 PM.
Editors: No access to your user account to assign reviewers or make decisions.
Copyeditors: No access to user account. Please download manuscripts you are copyediting before Wednesday, July 01, 2020 at 7:00 PM.
Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.
Assessment of Large Language Model Performance on Medical School Essay-Style Concept Appraisal Questions
Seysha Mehta;
Eliot Haddad;
Indira Bhavsar Burke;
Alana Majors;
Rie Maeda;
Sean M Burke;
Abhishek Deshpande;
Amy Nowacki;
Christina Lindenmeyer;
Neil Mehta
ABSTRACT
Microsoft Copilot, a ChatGPT 4.0 based Large Language Model, demonstrated comparable performance to medical students in answering essay-style CAPPs, while assessors struggled to differentiate AI from human responses. These results highlight the need to prepare students and educators for a future world of AI by fostering reflective learning practices and critical thinking.
Citation
Please cite as:
Mehta S, Haddad E, Burke IB, Majors A, Maeda R, Burke SM, Deshpande A, Nowacki A, Lindenmeyer C, Mehta N
Assessment of Large Language Model Performance on Medical School Essay-Style Concept Appraisal Questions: Exploratory Study