Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: Journal of Medical Internet Research

Date Submitted: Nov 20, 2023
Date Accepted: Apr 2, 2024

The final, peer-reviewed published version of this preprint can be found here:

Physician Versus Large Language Model Chatbot Responses to Web-Based Questions From Autistic Patients in Chinese: Cross-Sectional Comparative Analysis

He W, Zhang W, Jin Y, Zhou Q, Zhang H, Xia Q

Physician Versus Large Language Model Chatbot Responses to Web-Based Questions From Autistic Patients in Chinese: Cross-Sectional Comparative Analysis

J Med Internet Res 2024;26:e54706

DOI: 10.2196/54706

PMID: 38687566

PMCID: 11094593

Comparative Analysis of Responses to Online Autistic Patients' Questions in Chinese: Physicians vs. Large Language Model Chatbots

  • Wenjie He; 
  • Wenyan Zhang; 
  • Ya Jin; 
  • Qiang Zhou; 
  • Huadan Zhang; 
  • Qing Xia

ABSTRACT

Background:

There is a dearth of Feasibility assessments regarding the utilization of large language models for responding to inquiries from autistic patients within a Chinese-language context. Despite Chinese being one of the most widely spoken languages globally, the predominant focus of research on the application of these models in the medical field has been on English-speaking populations.

Objective:

To assess the effectiveness of LLM chatbots, specifically ChatGPT and ERNIE Bot, in addressing inquiries from individuals with autism in a Chinese setting.

Methods:

A total of 100 patient consultation samples were randomly selected from publicly available autism-related records on DXY, spanning the period from January 2018 to August 2023 and including 239 questions. To maintain objectivity, both the original questions and responses were anonymized and randomized. An evaluation team, consisting of three chief physicians, assessed the responses across four dimensions: relevance, accuracy, usefulness, and empathy. In total, 717 evaluations were conducted. The team initially identified the best response and then employed a Likert scale with five response categories to gauge the responses, each representing a distinct level of quality. Finally, a comparative analysis was conducted to compare the responses obtained from various sources.

Results:

Among the 717 evaluations conducted, 46.86% (95% CI, 43.21%–50.51%) of assessors displayed varying preferences for responses from physicians, with 34.87% (95% CI, 31.38%–38.36%) favoring ChatGPT and 18.27% (95% CI, 15.44%–21.10%) favoring ERNIE Bot. The average relevance scores for physicians, ChatGPT, and ERNIE Bot were 3.75 (95% CI, 3.69–3.82), 3.69 (95% CI, 3.63–3.74), and 3.41 (95% CI, 3.35–3.46), respectively. Regarding accuracy ratings, physicians (3.66, 95% CI, 3.60–3.73) and ChatGPT (3.73, 95% CI, 3.69–3.77) outperformed ERNIE Bot (3.52, 95%CI, 3.47–3.57). In terms of usefulness scores, physicians (3.54, 95% CI, 3.47–3.62) received higher ratings than ChatGPT (3.40, 95% CI, 3.34–3.47) and ERNIE Bot (3.05, 95% CI, 2.99–3.12). Finally, concerning the empathy dimension, ChatGPT (3.64, 95% CI, 3.57–3.71) outperformed physicians (3.13, 95% CI, 3.04–3.21) and ERNIE Bot (3.11, 95% CI, 3.04–3.18).

Conclusions:

In this cross-sectional study, physicians' responses exhibited overall superiority in the present Chinese language context. Nonetheless, LLMs can provide valuable medical guidance to patients with autism and may even surpass physicians in terms of demonstrating empathy. However, it is crucial to acknowledge that further optimization and research are imperative prerequisites before the effective integration of LLMs in clinical settings across diverse linguistic environments can be realized. Clinical Trial: The study was registered on chictr.org (ChiCTR2300074655)


 Citation

Please cite as:

He W, Zhang W, Jin Y, Zhou Q, Zhang H, Xia Q

Physician Versus Large Language Model Chatbot Responses to Web-Based Questions From Autistic Patients in Chinese: Cross-Sectional Comparative Analysis

J Med Internet Res 2024;26:e54706

DOI: 10.2196/54706

PMID: 38687566

PMCID: 11094593

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.