Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.
Who will be affected?
Readers: No access to all 28 journals. We recommend accessing our articles via PubMed Central
Authors: No access to the submission form or your user account.
Reviewers: No access to your user account. Please download manuscripts you are reviewing for offline reading before Wednesday, July 01, 2020 at 7:00 PM.
Editors: No access to your user account to assign reviewers or make decisions.
Copyeditors: No access to user account. Please download manuscripts you are copyediting before Wednesday, July 01, 2020 at 7:00 PM.
Evaluating Stroke Risk Models: A Retrospective Cohort Study Comparing Explainable Machine Learning Approaches with Traditional Statistical Methods
Sermkiat Lolak;
Ratchainant Thammasudjarit;
John Attia;
Gartth J. McKay;
Ammarin Thakkinstian
ABSTRACT
Background
Stroke has multiple modifiable and unmodifiable risk factors and represents a leading cause of death globally . This study aims to compare risk factors for stroke occurrence in a real-world cohort dataset using explainable machine learning (ML) approaches and conventional statistical models.
Methods
We assembled a retrospective cohort of high-risk patients who were treated at Ramathibodi Hospital, Thailand from January 2010 to December 2020. Prediction models were developed using logistic regression (LR), and ML models including XGBoost, Explainable Boosting Machine (EBM), Bayesian Network, and Tree-Augmented Naive Bayes. Model performance was compared using C-statistics and F-1 score.
Results
XGBoost and EBM had high predictive accuracy with C-statistics of 0.89 and 0.87 respectively whereas LR and EBM models yielded similar C-statistics of 0.80. Atrial fibrillation (AF), hypertension (HT), and antihypertensive medication were common significant factors, with AF being the most potent factor in both LR and XGBoost models. In contrast, plasma glucose was the strongest predictor in EBM models.
Conclusion
This study highlights the benefits of using ML approaches for the prediction of stroke in at risk patients. Further external validation of these ML approaches should be conducted before consideration as a tool adopted for routine clinical practice.
Citation
Please cite as:
Lolak S, Thammasudjarit R, Attia J, McKay GJ, Thakkinstian A
Comparing Explainable Machine Learning Approaches With Traditional Statistical Methods for Evaluating Stroke Risk Models: Retrospective Cohort Study