Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: Journal of Medical Internet Research

Date Submitted: Oct 22, 2024
Date Accepted: Mar 31, 2025

The final, peer-reviewed published version of this preprint can be found here:

Comparing Artificial Intelligence–Generated and Clinician-Created Personalized Self-Management Guidance for Patients With Knee Osteoarthritis: Blinded Observational Study

Du K, Li A, Zuo QH, Zhang CY, Guo R, Chen P, Du WS, Li SM

Comparing Artificial Intelligence–Generated and Clinician-Created Personalized Self-Management Guidance for Patients With Knee Osteoarthritis: Blinded Observational Study

J Med Internet Res 2025;27:e67830

DOI: 10.2196/67830

PMID: 40332991

PMCID: 12096024

Comparing AI-Generated and Clinician-Created Personalized Self-Management Guidance for Knee Osteoarthritis Patients: A Blinded Observational Study

  • Kai Du; 
  • Ao Li; 
  • Qi-Heng Zuo; 
  • Chen-Yu Zhang; 
  • Ren Guo; 
  • Ping Chen; 
  • Wei-Shuai Du; 
  • Shu-Ming Li

ABSTRACT

Background:

Background:

Knee osteoarthritis (OA) is a prevalent, chronic musculoskeletal disorder that impairs mobility and quality of life. Personalized patient education is for improving self-management and adherence, yet its delivery is often limited by time constraints, clinician workload, and the heterogeneity of patient needs. Recent advances in large language models (LLMs) offer potential solutions. GPT-4, distinguished by its long-context reasoning and adoption in clinical AI research, has emerged as a leading candidate for personalized health communication. However, its application in generating condition-specific educational guidance remains underexplored, and concerns about misinformation, personalization limits, and ethical oversight remain.

Objective:

This study aims to evaluate GPT-4’s ability to generate individualized self-management guidance for knee OA patients in comparison with clinician-created content.

Methods:

Methods:

This two-phase, double-blind, observational study used data from 50 patients previously enrolled in a registered randomized trial. In phase one, two orthopedic clinicians each generated personalized education materials for 25 patient profiles using anonymized clinical data, including history, symptoms, and lifestyle. In phase two, the same datasets were processed by GPT-4 using standardized prompts. All content was anonymized and evaluated by two independent, blinded clinical experts using validated scoring systems. Evaluation criteria included: (1) efficiency; (2) readability (Flesch-Kincaid, Gunning Fog, Coleman-Liau, SMOG); (3) accuracy; (4) personalization; and (5) comprehensiveness and safety. Disagreements between reviewers were resolved through consensus or third-party adjudication.

Results:

Results:

GPT-4 significantly outperformed clinicians in content generation speed (530.03 vs. 37.29 words per minute, P < 0.001). Readability was better on the Flesch-Kincaid (11.56 ± 1.08 vs. 12.67 ± 0.95), Gunning Fog (12.47 ± 1.36 vs. 14.56 ± 0.93), and SMOG (13.33 ± 1.00 vs. 13.81 ± 0.69) indices (all P < 0.001), though GPT-4 scored slightly higher on the Coleman-Liau Index (15.90 ± 1.03 vs. 15.15 ± 0.91). GPT-4 also outperformed clinicians in accuracy (5.31 ± 1.73 vs. 4.76 ± 1.10, P = 0.047), personalization (54.32 ± 6.21 vs. 33.20 ± 5.40, P < 0.001), comprehensiveness (51.74 ± 6.47 vs. 35.26 ± 6.66, P < 0.001), and safety (median 61 vs. 50, P < 0.001).

Conclusions:

Conclusion: GPT-4 demonstrated the capacity to generate personalized self-management guidance for knee OA with greater efficiency, accuracy, personalization, comprehensiveness, and safety than clinician-generated content, as assessed using standardized, guideline-aligned evaluation frameworks. These findings underscore the potential of LLMs to support scalable, high-quality patient education in chronic disease management. The observed lexical complexity suggests the need to refine outputs for populations with limited health literacy. As an exploratory, single-center study, these results warrant confirmation in larger, multicenter cohorts with diverse demographic profiles. Future implementation should be guided by ethical and operational safeguards, including data privacy, transparency, and the delineation of clinical responsibility. Hybrid models integrating AI-generated content with clinician oversight may offer a pragmatic path forward.


 Citation

Please cite as:

Du K, Li A, Zuo QH, Zhang CY, Guo R, Chen P, Du WS, Li SM

Comparing Artificial Intelligence–Generated and Clinician-Created Personalized Self-Management Guidance for Patients With Knee Osteoarthritis: Blinded Observational Study

J Med Internet Res 2025;27:e67830

DOI: 10.2196/67830

PMID: 40332991

PMCID: 12096024

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.