Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: Journal of Medical Internet Research

Date Submitted: Mar 27, 2023
Date Accepted: Aug 17, 2023

The final, peer-reviewed published version of this preprint can be found here:

The Potential of ChatGPT as a Self-Diagnostic Tool in Common Orthopedic Diseases: Exploratory Study

Kuroiwa T, Sarcon A, Ibara T, Yamada E, Yamamoto A, Tsukamoto K, Fujita K

The Potential of ChatGPT as a Self-Diagnostic Tool in Common Orthopedic Diseases: Exploratory Study

J Med Internet Res 2023;25:e47621

DOI: 10.2196/47621

PMID: 37713254

PMCID: 10541638

The potential of ChatGPT as a self-diagnostic tool in common orthopedic diseases: An exploratory study

  • Tomoyuki Kuroiwa; 
  • Aida Sarcon; 
  • Takuya Ibara; 
  • Eriku Yamada; 
  • Akiko Yamamoto; 
  • Kazuya Tsukamoto; 
  • Koji Fujita

ABSTRACT

Background:

Artificial intelligence (AI) has gained tremendous popularity recently, especially the use of natural language processing (NLP). ChatGPT is a state-of-the-art chatbot capable of creating natural conversations using NLP. The use of AI in medicine can have a tremendous impact on healthcare delivery. No study to date has evaluated the accuracy/precision of ChatGPT’s ability to "self-diagnosis".

Objective:

To evaluate ChatGPT’s ability to accurately/precisely "self-diagnosis" common orthopedic diseases.

Methods:

Over a 5-day course, the study authors submitted the same questions to ChatGPT. The conditions evaluated were carpal tunnel syndrome (CTS), cervical myelopathy (CM), lumbar spinal stenosis (LSS), knee osteoarthritis (KOA), and hip osteoarthritis (HOA). Answers were categorized as either "correct", "incorrect", or as a "differential diagnosis". The accuracy, precision, and percentage of correct answers was calculated. Answers were subcategorized into each disease’s name and as a "differential diagnosis". The intra- and inter-examiner variability was calculated via Fleiss-Kappa coefficient. Answers that recommended that the patient seek medical attention were recategorized according to the strength of the recommendation as defined by the study. There were different phrases used, thus the percentages were obtained.

Results:

The percentage of correct answers were 100%, 4%, 96%, 64%, and 68% for CTS, CM, LSS, KOA, and HOA, respectively. The ratio of incorrect answers were 92% for CM, and 0% for all others. Intra-rater variability was 1.0, 0.15, 0.7, 0.6, and 0.6 for CTS, CM, LSS, KOA, and HOA, respectively; inter-rater variability was 1.0, 0.1, 0.64, -0.12, and 0.04 for CTS, CM, LSS, KOA, and HOA, respectively. The phrases, “essential”, “recommended”, “best”, and “important” were occurred with answers that recommended seeking medical attention. “Essential” occurred 3.2%, “recommended” at 9.6%, “best” at 6.4%, and “important” at 75.2%. Around 5.6% of the answers did not have recommendations to seek medical attention.

Conclusions:

The accuracy/precision of ChatGPT to “self-diagnose” 5 common orthopedic conditions was inconsistent. The accuracy could potentially be improved by adding symptoms that could easily identify a specific location. Only a few answers (12.8%) had a strong recommendation to seek medical attention by our study standards. Although ChatGPT could serve as a potential first step to access to care, we found variability in an accurate “self-diagnosis”. Given the risk of harm with “self-diagnosis” without medical followup, it would be prudent for a NLP to include direct language alerting patients to seek an expert opinion. We hope to shed further light on the use of AI in a future clinical study.


 Citation

Please cite as:

Kuroiwa T, Sarcon A, Ibara T, Yamada E, Yamamoto A, Tsukamoto K, Fujita K

The Potential of ChatGPT as a Self-Diagnostic Tool in Common Orthopedic Diseases: Exploratory Study

J Med Internet Res 2023;25:e47621

DOI: 10.2196/47621

PMID: 37713254

PMCID: 10541638

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.