Currently submitted to: Journal of Medical Internet Research
Date Submitted: Feb 12, 2026
Open Peer Review Period: Feb 13, 2026 - Apr 10, 2026
(currently open for review)
Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.
Impact of Large Language Model-Generated versus Clinician-Generated Advice on Resuscitation Preferences and Chinese-Language Readability in Advanced Cancer Patients in the Emergency Department: A Randomised Controlled Trial
ABSTRACT
Background:
For patients with advanced cancer in the emergency department (ED), decisions regarding life-sustaining treatments (LST) are critical and hinge on clear communication of complex prognoses. While large language models (LLMs) can synthesize clinical information, their comparative effectiveness against clinicians in shaping real patient preferences, and the readability of their outputs, remain unproven.
Objective:
This study aimed to determine if LLM-generated advice is non-inferior to clinician-generated advice in changing patient resuscitation preferences. Secondarily, we compared the Chinese-language readability of the advice using a validated formula with a clinical cutoff and assessed patient satisfaction.
Methods:
We conducted a three-arm, parallel, randomized controlled non-inferiority trial. 189 adult patients with advanced cancer in the ED were assigned to review structured advice generated by: (1) a senior clinician, (2) ChatGPT-5.0 Mini, or (3) DeepSeek. The primary outcome was the change in score on the Cancer Advanced Care Preferences Scale. Secondary outcomes included text readability score (assessed by a validated Chinese health literacy formula) and patient satisfaction.
Results:
A total of 189 participants were enrolled and completed the study. In the primary non-inferiority analysis, the change in resuscitation preference scores for the DeepSeek group was non-inferior to that of the clinician group (mean difference: -0.095 points, 95% CI: -0.750 to 0.560; lower limit > -1.7 margin). Similarly, ChatGPT-5.0 Mini was also non-inferior to the clinician group (mean difference: 0.349 points, 95% CI: -0.237 to 0.935; lower limit > -1.7 margin). Regarding secondary outcomes, a significant difference in readability was found among the three groups (Kruskal-Wallis H(2)=129.36, p<0.001). Post-hoc comparisons indicated that texts from DeepSeek had the highest median readability score (7.53, IQR: 7.39-7.62), followed by ChatGPT-5.0 Mini (5.93, IQR: 5.60-6.23), and clinician-generated texts (5.51, IQR: 5.29-5.74), with all pairwise differences being significant (p<0.001). However, no significant difference in patient satisfaction was observed across the groups (H(2)=1.10, p=.578).
Conclusions:
LLM-generated advice was non-inferior to clinician advice in influencing resuscitation preferences. Its superior readability and higher patient satisfaction highlight the potential of LLMs as a scalable tool to support complex decision-making in time-pressured ED settings.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.