Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: Journal of Medical Internet Research

Date Submitted: Mar 9, 2026
Date Accepted: Mar 20, 2026

The final, peer-reviewed published version of this preprint can be found here:

Beyond GPT-4: The Rapidly Evolving Potential of Large Language Models for Clinical Guideline Improvement

Nelson SD, Wright A

Beyond GPT-4: The Rapidly Evolving Potential of Large Language Models for Clinical Guideline Improvement

J Med Internet Res 2026;28:e95004

DOI: 10.2196/95004

PMID: 41962130

Beyond GPT-4: The Rapidly Evolving Potential of Large Language Models for Clinical Guideline Improvement

  • Scott D Nelson; 
  • Adam Wright

ABSTRACT

This commentary reviews the study by Jones et al., which evaluated whether GPT-4 could improve the readability of injectable medication guidelines while preserving important safety information. The study found that GPT-4 produced modest readability gains comparable to manual revision, but also introduced omissions and meaning changes in a minority of sections. These findings highlight both the potential and limitations of early large language models (LLMs) in clinical contexts. However, this study reflects the capabilities of a specific model in a rapidly evolving domain. Since the release of GPT‑4, advances in multi‑step reasoning, model‑critique workflows, and structured validation have substantially improved the ability of newer systems to detect omissions, maintain factual fidelity, and support controlled editing. As a result, some documented limitations may stem from the constraints of a single‑model, single‑pass workflow rather than intrinsic flaws in LLM‑assisted guideline revision. This commentary highlights the need for evaluation frameworks that can keep pace with LLM progress and emphasizes that clinical oversight and user-centered testing remain essential. Updated research using contemporary models is needed to determine how emerging architectures can more safely support clarity, consistency, and maintenance of clinical guidelines.


 Citation

Please cite as:

Nelson SD, Wright A

Beyond GPT-4: The Rapidly Evolving Potential of Large Language Models for Clinical Guideline Improvement

J Med Internet Res 2026;28:e95004

DOI: 10.2196/95004

PMID: 41962130

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.