JMIR Preprints #47736: Evaluating Stroke Risk Models: A Retrospective Cohort Study Comparing Explainable Machine Learning Approaches with Traditional Statistical Methods

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

Evaluating Stroke Risk Models: A Retrospective Cohort Study Comparing Explainable Machine Learning Approaches with Traditional Statistical Methods

Sermkiat Lolak;
Ratchainant Thammasudjarit;
John Attia;
Gartth J. McKay;
Ammarin Thakkinstian

ABSTRACT

Background Stroke has multiple modifiable and unmodifiable risk factors and represents a leading cause of death globally . This study aims to compare risk factors for stroke occurrence in a real-world cohort dataset using explainable machine learning (ML) approaches and conventional statistical models. Methods We assembled a retrospective cohort of high-risk patients who were treated at Ramathibodi Hospital, Thailand from January 2010 to December 2020. Prediction models were developed using logistic regression (LR), and ML models including XGBoost, Explainable Boosting Machine (EBM), Bayesian Network, and Tree-Augmented Naive Bayes. Model performance was compared using C-statistics and F-1 score. Results XGBoost and EBM had high predictive accuracy with C-statistics of 0.89 and 0.87 respectively whereas LR and EBM models yielded similar C-statistics of 0.80. Atrial fibrillation (AF), hypertension (HT), and antihypertensive medication were common significant factors, with AF being the most potent factor in both LR and XGBoost models. In contrast, plasma glucose was the strongest predictor in EBM models. Conclusion This study highlights the benefits of using ML approaches for the prediction of stroke in at risk patients. Further external validation of these ML approaches should be conducted before consideration as a tool adopted for routine clinical practice.

Citation

Please cite as:

Lolak S, Thammasudjarit R, Attia J, McKay GJ, Thakkinstian A

Comparing Explainable Machine Learning Approaches With Traditional Statistical Methods for Evaluating Stroke Risk Models: Retrospective Cohort Study

JMIR Cardio 2023;7:e47736

DOI: 10.2196/47736

PMID: 37494080

PMCID: 10413234

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Accepted for/Published in: JMIR Cardio

Date Submitted: Mar 30, 2023

Date Accepted: Jun 15, 2023

Evaluating Stroke Risk Models: A Retrospective Cohort Study Comparing Explainable Machine Learning Approaches with Traditional Statistical Methods

ABSTRACT

Citation

Copyright