JMIR Preprints #70257: Rule-based natural language processing in oncology clinical decision support: a systematic review and meta-analysis

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

Rule-based natural language processing in oncology clinical decision support: a systematic review and meta-analysis

Stephen Ali;
Garikai Kungwengwe;
Dafydd Hughes;
Thomas Dobbs;
Hayley Hutchings;
Iain Whitaker

Background:

The exponential growth of healthcare data has driven increased use of natural language processing (NLP) for clinical decision support (CDS) in oncology. Although machine learning models have received considerable attention, the diagnostic performance and clinical utility of rule-based NLP systems remain less explored, particularly when compared with human assessments.

Objective:

This systematic review and meta-analysis evaluates the diagnostic accuracy of rule-based NLP systems for oncology CDS. We aimed to: (i) compute pooled sensitivity, specificity, and AUC; (ii) compare performance across tumour types and clinical tasks; and (iii) benchmark rule-based algorithms against clinician assessments. We hypothesised that rule-based systems achieve high accuracy in structured tasks but vary by tumour type.

Methods:

A systematic review was conducted by searching EMBASE, MEDLINE, CINAHL, the Cochrane Library, Web of Science, and the Collection of Computer Science Bibliographies for studies published up to 13th April 2020. Eligible studies applied rule-based NLP for cancer-related CDS with human comparators. Two reviewers independently screened records, extracted data via Covidence, and assessed study quality using TRIPOD criteria. A bivariate random-effects meta-analysis estimated pooled sensitivity, specificity, and AUC. Subgroup analyses compared performance across tumour types and clinical tasks. Univariate meta-regressions evaluated the influence of publication year and dataset size, and Deek’s regression test assessed publication bias.

Results:

Of 3,223 screened records, 89 studies met inclusion criteria, spanning publication years 1993–2020 and analysing over 1.2 million patient records. Breast cancer was the most frequently studied (24.7%), followed by multiple cancer types (14.6%), colorectal (12.4%), lung (10.1%), and prostate cancer (9.0%).Meta-analysis of 35 studies yielded a pooled sensitivity of 0.96 (95% CI: 0.93–0.97), specificity of 0.98 (95% CI: 0.95–0.99), and an AUC of 0.99. Subgroup analysis showed breast cancer algorithms achieved sensitivity and specificity of 0.98, while malignant melanoma algorithms reported lower sensitivity (0.49) but high specificity (0.93). Pancreatic cancer algorithms performed well (sensitivity 0.97, specificity 0.98). Most studies used retrospective designs, relying on electronic health records and pathology reports. Quality assessment scores ranged from 48% to 92% adherence to TRIPOD criteria. Risk of bias assessment rated 6 studies (6.7%) as high quality, 57 (64.0%) as fair, and 26 (29.2%) as low. Heterogeneity was moderate to high (I²: 59%–74%), with no significant association between performance and publication year or dataset size. Deek’s test indicated no publication bias (p = 0.73).

Conclusions:

Rule-based NLP systems exhibit high diagnostic accuracy for oncology CDS, though performance varies by tumour type and clinical context. Limitations include study heterogeneity, variable reporting standards, and reliance on retrospective data. These findings highlight the need for standardised reporting, direct comparisons with machine learning approaches, and prospective validation to enhance clinical applicability.

ClinicalTrial:

Prospero (CRD42020180676).

Citation

Please cite as:

Ali S, Kungwengwe G, Hughes D, Dobbs T, Hutchings H, Whitaker I

Rule-based natural language processing in oncology clinical decision support: a systematic review and meta-analysis

Journal of Medical Internet Research. 17/04/2025:70257 (forthcoming/in press)

DOI: 10.2196/70257

URL: https://preprints.jmir.org/preprint/70257

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

JMIR Publications

JMIR Preprints

Currently accepted at: Journal of Medical Internet Research

Date Submitted: Dec 18, 2024

Date Accepted: Apr 17, 2025

Rule-based natural language processing in oncology clinical decision support: a systematic review and meta-analysis

Citation