Accepted for/Published in: Journal of Medical Internet Research
Date Submitted: Mar 27, 2023
Date Accepted: Aug 17, 2023
The potential of ChatGPT as a self-diagnostic tool in common orthopedic diseases: An exploratory study
ABSTRACT
Background:
Artificial intelligence (AI) has gained tremendous popularity recently, especially the use of natural language processing (NLP). ChatGPT is a state-of-the-art chatbot capable of creating natural conversations using NLP. The use of AI in medicine can have a tremendous impact on healthcare delivery. No study to date has evaluated the accuracy/precision of ChatGPT’s ability to "self-diagnosis".
Objective:
To evaluate ChatGPT’s ability to accurately/precisely "self-diagnosis" common orthopedic diseases.
Methods:
Over a 5-day course, the study authors submitted the same questions to ChatGPT. The conditions evaluated were carpal tunnel syndrome (CTS), cervical myelopathy (CM), lumbar spinal stenosis (LSS), knee osteoarthritis (KOA), and hip osteoarthritis (HOA). Answers were categorized as either "correct", "incorrect", or as a "differential diagnosis". The accuracy, precision, and percentage of correct answers was calculated. Answers were subcategorized into each disease’s name and as a "differential diagnosis". The intra- and inter-examiner variability was calculated via Fleiss-Kappa coefficient. Answers that recommended that the patient seek medical attention were recategorized according to the strength of the recommendation as defined by the study. There were different phrases used, thus the percentages were obtained.
Results:
The percentage of correct answers were 100%, 4%, 96%, 64%, and 68% for CTS, CM, LSS, KOA, and HOA, respectively. The ratio of incorrect answers were 92% for CM, and 0% for all others. Intra-rater variability was 1.0, 0.15, 0.7, 0.6, and 0.6 for CTS, CM, LSS, KOA, and HOA, respectively; inter-rater variability was 1.0, 0.1, 0.64, -0.12, and 0.04 for CTS, CM, LSS, KOA, and HOA, respectively. The phrases, “essential”, “recommended”, “best”, and “important” were occurred with answers that recommended seeking medical attention. “Essential” occurred 3.2%, “recommended” at 9.6%, “best” at 6.4%, and “important” at 75.2%. Around 5.6% of the answers did not have recommendations to seek medical attention.
Conclusions:
The accuracy/precision of ChatGPT to “self-diagnose” 5 common orthopedic conditions was inconsistent. The accuracy could potentially be improved by adding symptoms that could easily identify a specific location. Only a few answers (12.8%) had a strong recommendation to seek medical attention by our study standards. Although ChatGPT could serve as a potential first step to access to care, we found variability in an accurate “self-diagnosis”. Given the risk of harm with “self-diagnosis” without medical followup, it would be prudent for a NLP to include direct language alerting patients to seek an expert opinion. We hope to shed further light on the use of AI in a future clinical study.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.