JMIR Preprints #103273: Automation of Living Clinical Practice Guideline Development: A Scoping Review.

Automation of Living Clinical Practice Guideline Development: A Scoping Review.

Abdullah Altammami;
Olga Ares-Rufo;
Abdullah M. Alkhotani;
Dinesh Mital;
Anas H Alzahrani;
Jose F Florez-Arango

ABSTRACT

Background:

Clinical practice guidelines (CPGs) translate research evidence into actionable recommendations, but their development faces mounting challenges. With dozens of trials and systematic reviews published daily, the traditional 18–24-month guideline development cycle struggles to incorporate emerging evidence. Manual processes for literature screening, evidence synthesis, and quality assessment consume thousands of person-hours per guideline. These challenges create a knowledge translation pipeline that is slow, inconsistent, and rapidly outdated.

Objective:

This scoping review aimed to map and characterize the range of automated technologies supporting CPG development, focusing on how artificial intelligence (AI) and computational methods can enable "living guidelines" that continuously incorporate new evidence.

Methods:

We conducted this scoping review following PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension for Scoping Reviews) guidelines This scoping review followed the Joanna Briggs Institute (JBI) framework for scoping reviews[1], with reporting adhering to the PRISMA-ScR extension[2]. We searched PubMed/MEDLINE from January 1, 2001, through December 31, 2025, combining guideline terms with automation and AI concepts. Studies were included if they described computational methods for automating guideline development processes (retrieval, extraction, encoding, or evaluation), included original research with empirical testing, and provided sufficient methodological detail for assessment. Two reviewers independently screened records and performed data extraction. We organized findings into four pillars: retrieval, extraction, encoding, and evaluation.

Results:

Forty-three studies met eligibility criteria, spanning 2001 to 2025, with a marked increase in publications after 2015 and a notable surge in LLM-related work during 2024–2025. In the Retrieval pillar (n=5), automated literature search and screening tools demonstrated the ability to substantially reduce manual workload, with machine learning classifiers achieving 70%–80% precision. In the Extraction pillar (n=12), tools showed varied success; rule-based natural language processing (NLP) systems achieved up to 81% accuracy for structured clinical actions, while hybrid ontology methods covered 86% of eligibility criteria. However, Large Language Models (LLMs) demonstrated mixed results, excelling at summarization but failing to perform independent, reliable literature retrieval. Machine learning classifiers for screening reduced burden by 20%-50% with high recall, but semantic search tools often traded recall for higher specificity (98%). In the Encoding pillar (n=23), approaches highlighted the evolution from formalisms like GLIF3 and GEM to modern hybrid approaches. Encoding demonstrated unique value in logic verification, with systems capable of detecting temporal inconsistencies and coverage gaps that human reviewers missed. Semi-automated tools reduced encoding time by approximately 40%, and novel "living" frameworks allowed guidelines to inherit unchanged logic, requiring updates only for modified text. Automated evaluation systems (12 studies) achieved moderate to substantial agreement with expert assessments (κ=0.44-0.79). While LLMs could rank-order guideline quality effectively, they exhibited a systematic upward scoring bias of approximately 12%-13% on AGREE II evaluations compared to human experts.

Conclusions:

Automated technologies have progressed from proof-of-concept demonstrations to practical tools approaching human performance on specific structured tasks. However, the field remains fragmented. Current automation excels at structured, data-intensive tasks while struggling with contextual interpretation. Realizing the vision of continuously updated living guidelines requires systematic integration across extraction-encoding-evaluation workflows, particularly using standards like FHIR and GLIF to bridge the gap between narrative text and computable logic. The consistent positioning of automation as augmentative rather than autonomous reflects the irreducibly human aspects of guideline development.

Citation

Please cite as:

Altammami A, Ares-Rufo O, Alkhotani AM, Mital D, Alzahrani AH, Florez-Arango JF

Automation of Living Clinical Practice Guideline Development: A Scoping Review.

JMIR Preprints. 02/06/2026:103273

DOI: 10.2196/preprints.103273

URL: https://preprints.jmir.org/preprint/103273

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Currently submitted to: JMIR AI

Date Submitted: Jun 2, 2026

Open Peer Review Period: Jun 9, 2026 - Aug 4, 2026

(currently open for review)

Automation of Living Clinical Practice Guideline Development: A Scoping Review.

ABSTRACT

Citation

Copyright