A Multidisciplinary Assessment of ChatGPT’s Knowledge of Amyloidosis: An Observational Study
ABSTRACT
Background:
Amyloidosis, a rare multisystem condition, requires multidisciplinary care. Its low prevalence underscores the importance of patient education for better outcomes. The large language model (LLM) ChatGPT offers a potential avenue for disseminating accurate, reliable, accessible educational resources.
Objective:
We performed a multidisciplinary assessment of the accuracy and reproducibility of ChatGPT in answering questions related to amyloidosis.
Methods:
A total of 98 Amyloidosis questions related to cardiology, gastroenterology, and neurology were curated from medical societies, institutions and amyloidosis Facebook support groups and inputted into GPT-3.5 and GPT-4. Cardiology and Gastroenterology related responses were independently graded by a gastroenterologist and cardiologist who specialize in amyloidosis. Disagreements were resolved with discussion. Neurology related responses were graded by a neurologist who specializes in amyloidosis. Reviewers used the following grading scale: 1) Comprehensive 2) Correct but inadequate 3) Some correct and some incorrect 4) Completely incorrect. Questions were stratified by categories for further analysis. Reproducibility was assessed by inputting each question twice into each model.
Results:
GPT-4 provided 93/98 (94.9%) responses with accurate information, 82/98 (83.7%) of which were comprehensive. GPT-3.5 provided 74/83 (89.2%) responses with accurate information, 66/83 (79.5%) of which were comprehensive. When examined by question category, GTP-4 and GPT-3.5 provided 53/56 (94.6%) and 48/56 (85.7%) comprehensive responses, respectively, to “general questions”. When examined by subject, GPT-4 and GPT-3.5 performed best in response to cardiology questions with both models producing 10/12 (83.3%) comprehensive responses. For gastroenterology, GPT-4 received comprehensive grades for 9/15 (60.0%) of responses and GPT-3.5 provided 8/15 (53.3%). Overall, 97/98 (99.0%) of responses for GPT-4 and 78/83 (94.0%) for GPT-3.5 were reproducible.
Conclusions:
LLMs have potential as a supplemental tool in disseminating vital health education to patients living with amyloidosis. Prior to widespread implementation, the technology’s limitations and ethical implications must be further explored.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.