JMIR Preprints #49183: ChatGPT vs. Consultants: A Pilot Study on Answering Otorhinolaryngology Case-Based Questions

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

ChatGPT vs. Consultants: A Pilot Study on Answering Otorhinolaryngology Case-Based Questions

Christoph Raphael Buhr;
Harry Smith;
Tilman Huppertz;
Katharina Bahr-Hamm;
Christoph Matthias;
Andrew Blaikie;
Tom Kelsey;
Sebastian Kuhn;
Jonas Eckrich

ABSTRACT

Background:

Large language models (LLMs), like ChatGPT, are increasingly utilized in medicine and supplement standard search engines as information sources. This leads to more "consultations" of LLMs about personal medical symptoms.

Objective:

This study aims to evaluate ChatGPT's performance in answering clinical case-based questions in otorhinolaryngology (ORL) in comparison to ORL consultants' answers.

Methods:

We used 41 case-based questions from established ORL study books and past German state examinations for doctors. The questions were answered by both ORL consultants and ChatGPT 3. ORL consultants rated all responses, except their own, on medical adequacy, conciseness, coherence, and comprehensibility using a 6-step Likert-scale. They also identified if the answer was created by an ORL consultant or ChatGPT. Additionally, the character count was compared.

Results:

Ratings in all categories were significantly higher for ORL consultants. Although scores were inferior to the ORL consultants, ChatGPT's scores were relatively higher in semantic categories (conciseness, coherence, and comprehensibility) compared to medical adequacy. ORL consultants identified ChatGPT as the source in over 95% of cases. ChatGPT's answers had a significantly higher character count compared to ORL consultants.

Conclusions:

While ChatGPT provided longer answers to medical problems, medical adequacy and conciseness were significantly lower compared to ORL consultants' answers. LLMs have potential as augmentative tools for medical care, but their "consultation" for medical problems carries a high risk of misinformation, as their high semantic quality may mask contextual deficits. Clinical Trial: Written correspondence of March 3rd 2023 with the ethics committee of the regional medical association Rhineland-Palatinate determined no need for any specific ethical approval due to the use of anonymous text based questions.

Citation

Please cite as:

Buhr CR, Smith H, Huppertz T, Bahr-Hamm K, Matthias C, Blaikie A, Kelsey T, Kuhn S, Eckrich J

ChatGPT Versus Consultants: Blinded Evaluation on Answering Otorhinolaryngology Case–Based Questions

JMIR Med Educ 2023;9:e49183

DOI: 10.2196/49183

PMID: 38051578

PMCID: 10731554

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Accepted for/Published in: JMIR Medical Education

Date Submitted: May 20, 2023

Date Accepted: Oct 20, 2023

ChatGPT vs. Consultants: A Pilot Study on Answering Otorhinolaryngology Case-Based Questions

ABSTRACT

Citation

Copyright