JMIR Preprints #73226: Effectiveness of Large Language Models in Stroke Rehabilitation Health Education: A Comparative Study of ChatGPT-4, MedGo, Qwen, and ERNIE Bot

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

Effectiveness of Large Language Models in Stroke Rehabilitation Health Education: A Comparative Study of ChatGPT-4, MedGo, Qwen, and ERNIE Bot

Shiqi Qiang;
Haitao Zhang;
Yang Liao;
Yue Zhang;
Yanfen Gu;
Yiyan Wang;
Zehui Xu;
Hui Shi;
Nuo Han;
Haipin Yu

ABSTRACT

Background:

Stroke is a leading cause of disability and death worldwide, with home-based rehabilitation playing a crucial role in improving patient prognosis and quality of life. Traditional health education often lacks precision, personalization, and accessibility.In contrast, large language models (LLMs) are gaining attention for their potential in medical health education, owing to their advanced natural language processing capabilities. However, the effectiveness of LLMs in home-based stroke rehabilitation remains uncertain.

Objective:

This study evaluates the effectiveness of four LLMs—ChatGPT-4, MedGo, Qwen, and ERNIE Bot—selected for their diversity in model type, clinical relevance, and accessibility at the time of study design, in home-based stroke rehabilitation. The aim is to offer stroke patients more precise and secure health education pathways while exploring the feasibility of using LLMs to guide health education.

Methods:

In the first phase of this study, a literature review and expert interviews identified 15 common questions and 2 clinical cases relevant to stroke patients in home-based rehabilitation. These were input into four LLMs for simulated consultations. Six medical experts (2 clinicians, 2 nursing specialists, and 2 rehabilitation therapists) evaluated the LLM-generated responses using a Likert 5-point scale, assessing accuracy, completeness, readability, safety, and humanity. In the second phase, the top two performing models from phase one were selected. Thirty stroke patients undergoing home-based rehabilitation were recruited. Each patient asked both models 3 questions, rated the responses using a satisfaction scale, and assessed readability, text length, and recommended reading age using a Chinese readability analysis tool. Data were analyzed using one-way ANOVA, post hoc Tukey HSD tests, and paired t-tests.

Results:

The results revealed significant differences across the four models in five dimensions: accuracy (P = .002), completeness (P < .001), readability (P = .04), safety (P = .007), and humanity (P < .001). ChatGPT-4 outperformed all models in each dimension, with scores for accuracy (M = 4.28, SD = 0.84), completeness (M = 4.35, SD = 0.75), readability (M = 4.28, SD = 0.85), safety (M = 4.38, SD = 0.81), and user-friendliness (M = 4.65, SD = 0.66). MedGo excelled in accuracy (M = 4.06, SD = 0.78) and completeness (M = 4.06, SD = 0.74). Qwen and ERNIE Bot scored significantly lower across all five dimensions compared to ChatGPT-4 and MedGo. ChatGPT-4 generated the longest responses (M = 1338.35, SD = 236.03) and had the highest readability score (M = 12.88). In the second phase, ChatGPT-4 performed the best overall, while MedGo provided the clearest responses.

Conclusions:

LLMs, particularly ChatGPT-4 and MedGo, demonstrated promising performance in home-based stroke rehabilitation education. However, discrepancies between expert and patient evaluations highlight the need for improved alignment with patient comprehension and expectations. Enhancing clinical accuracy, readability, and oversight mechanisms will be essential for future real-world integration.

Citation

Please cite as:

Qiang S, Zhang H, Liao Y, Zhang Y, Gu Y, Wang Y, Xu Z, Shi H, Han N, Yu H

Application of Large Language Models in Stroke Rehabilitation Health Education: 2-Phase Study

J Med Internet Res 2025;27:e73226

DOI: 10.2196/73226

PMID: 40694436

PMCID: 12306586

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Accepted for/Published in: Journal of Medical Internet Research

Date Submitted: Feb 28, 2025

Date Accepted: May 13, 2025

Effectiveness of Large Language Models in Stroke Rehabilitation Health Education: A Comparative Study of ChatGPT-4, MedGo, Qwen, and ERNIE Bot

ABSTRACT

Citation

Copyright