JMIR Preprints #70047: Evaluating Large Language Models for Preoperative Patient Education in Superior Capsular Reconstruction: A Comparative Study of Claude, GPT and Gemini

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

Evaluating Large Language Models for Preoperative Patient Education in Superior Capsular Reconstruction: A Comparative Study of Claude, GPT and Gemini

Yukang Liu;
Hua Li;
Jianfeng Ouyang;
Zhaowen Xue;
Min Wang;
Hebei He;
Bin Song;
Xiaofei Zheng;
Wenyi Gan

ABSTRACT

Background:

Large Language Models (LLMs) are revolutionizing natural language processing, increasingly applied in clinical settings to enhance preoperative patient education.

Objective:

To evaluate the effectiveness and applicability of various LLMs in preoperative patient education by analyzing their responses to superior capsular reconstruction (SCR)-related inquiries.

Methods:

Ten sports medicine clinical experts formulated 11 SCR issues and developed preoperative patient education strategies during an online meeting, inputting 12 text commands into Claude-3-Opus, GPT-4-Turbo, and Gemini-1.5-Pro. Three experts assessed the language models' responses for correctness, completeness, logic, potential harm, and overall satisfaction, while preoperative education documents were evaluated using DISCERN and PEMAT-P tools, and reviewed by five postoperative patients for readability and educational value; readability of all responses was also analyzed using the cntext package.

Results:

Between July 1 and August 17, sports medicine experts evaluated 33 responses and 3 preoperative patient education documents generated by three language models regarding SCR surgery. For the 11 query responses, clinicians rated Gemini significantly higher than Claude in all categories (P<.05) and higher than GPT in completeness, risk avoidance, and overall rating (P<.05). For the 3 educational documents, Gemini's PEMAT score significantly exceeded Claude's (P=.034), and patients rated Gemini's materials superior in all aspects, with significant differences in educational quality vs. Claude (P=.017) and overall satisfaction vs. both Claude (P=.009) and GPT (P=.012). GPT had significantly higher readability than Claude on three R-based metrics (P<.01). Inter-rater agreement was high among clinicians and fair among patients.

Conclusions:

Claude-3-Opus, GPT-4-Turbo, and Gemini-1.5-Pro effectively generated readable pre-surgical education materials but lacked citations and failed to discuss alternative treatments or the risks of forgoing SCR surgery, highlighting the need for expert oversight when using these LLMs in patient education. Clinical Trial: Not available.

Citation

Please cite as:

Liu Y, Li H, Ouyang J, Xue Z, Wang M, He H, Song B, Zheng X, Gan W

Evaluating Large Language Models for Preoperative Patient Education in Superior Capsular Reconstruction: Comparative Study of Claude, GPT, and Gemini

JMIR Perioper Med 2025;8:e70047

DOI: 10.2196/70047

PMID: 40505086

PMCID: 12178570

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Accepted for/Published in: JMIR Perioperative Medicine

Date Submitted: Dec 13, 2024

Date Accepted: Apr 8, 2025

Evaluating Large Language Models for Preoperative Patient Education in Superior Capsular Reconstruction: A Comparative Study of Claude, GPT and Gemini

ABSTRACT

Citation

Copyright