Accepted for/Published in: Journal of Medical Internet Research
Date Submitted: Jul 15, 2023
Date Accepted: Jan 30, 2024
Comparative Evaluation of ChatGPT and DingXiangYuan Forum for Online Orthopaedic Consultations: A Study on Quality and Dependability
ABSTRACT
Background:
The artificial intelligence(AI) software represented by ChatGPT is swiftly evolving in a variety of domains. In order to satisfy the demand for remote consultation, the rapid development of the Internet has also led to the creation of artificial online medical consultations.
Objective:
The objective of this research is to establish whether or not ChatGPT4.0 has the necessary dependability and usability to be used as software for providing online consulting services.
Methods:
We obtained 82 online orthopaedic consultations from the "DOCTOR DINGXIANG" website. The website's real-name certified physicians' responses served as the control group, while ChatGPT's responses to patients' queries served as the experimental group. Three qualified orthopaedic surgeons assessed the two groups of responses blind to their categorisation. "Logical reasoning", "Internal information", "External information", "Guiding function", "Therapeutic effect", "Medical knowledge popularisation education", and "Overall satisfaction" were the seven scoring criteria. Fleiss kappa was then utilised to assess the consistency of the three raters.
Results:
After removing the problem of the website hiding crucial picture information, we received a total of 73 enquiries. After statistical scoring, we discovered that ChatGPT group's internal score(4.61±0.52 vs 4.66±0.49) and therapy score(4.43±0.75 vs 4.55±0.62) were lower than control group, but the difference was not statistically significant(p>0.05). ChatGPT group got better performence in the logical score(4.81±0.36 vs 4.75±0.39), external score(4.06±0.72 vs 3.92±0.77), patient navigation score(4.73±0.51 vs 4.72±0.54), the changes were not statistically significant.(p>0.05). However, we were glad to see that in terms of online diagnosis and treatment, ChatGPT's popularizing science scores are better than that of control group(4.49±0.67 vs 3.87±1.01), the difference was statistically significant(p<0.05). In terms of overall satisfaction,the difference was not statistically significant between groups(8.35±1.38 vs 8.37±1.24, p>0.05). According to how Fleiss kappa values were interpreted, six of the control group's score points were classified as "Fair agreement(p<0.001)" and one as "Substantial agreement(p<0.001)." In the experimental group, three points were classified as "Fair agreement" and four as "Moderate agreement(p<0.001)."
Conclusions:
In comparison to replies from certified doctors, ChatGPT4.0 exhibited comparable logic, use of internal and external knowledge, diagnosis accuracy, treatment efficacy, and overall satisfaction while replying to online enquiries about orthopaedics. Additionally, ChatGPT4.0 shows improved capability for scientific instruction. As a result, ChatGPT4.0 may be made more well-known among the general public as a substitute for manual online health consulting. Clinical Trial: Since this study is a horizontal comparative study of software applications and does not involve ethical issues, the ethics Committee of the First Affiliated Hospital of Jinan University exempted it from the ethical requirements of this study.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.