Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR Cardio

Date Submitted: Mar 30, 2023
Date Accepted: Jun 15, 2023

The final, peer-reviewed published version of this preprint can be found here:

Comparing Explainable Machine Learning Approaches With Traditional Statistical Methods for Evaluating Stroke Risk Models: Retrospective Cohort Study

Lolak S, Thammasudjarit R, Attia J, McKay GJ, Thakkinstian A

Comparing Explainable Machine Learning Approaches With Traditional Statistical Methods for Evaluating Stroke Risk Models: Retrospective Cohort Study

JMIR Cardio 2023;7:e47736

DOI: 10.2196/47736

PMID: 37494080

PMCID: 10413234

Evaluating Stroke Risk Models: A Retrospective Cohort Study Comparing Explainable Machine Learning Approaches with Traditional Statistical Methods

  • Sermkiat Lolak; 
  • Ratchainant Thammasudjarit; 
  • John Attia; 
  • Gartth J. McKay; 
  • Ammarin Thakkinstian

ABSTRACT

Background Stroke has multiple modifiable and unmodifiable risk factors and represents a leading cause of death globally . This study aims to compare risk factors for stroke occurrence in a real-world cohort dataset using explainable machine learning (ML) approaches and conventional statistical models. Methods We assembled a retrospective cohort of high-risk patients who were treated at Ramathibodi Hospital, Thailand from January 2010 to December 2020. Prediction models were developed using logistic regression (LR), and ML models including XGBoost, Explainable Boosting Machine (EBM), Bayesian Network, and Tree-Augmented Naive Bayes. Model performance was compared using C-statistics and F-1 score. Results XGBoost and EBM had high predictive accuracy with C-statistics of 0.89 and 0.87 respectively whereas LR and EBM models yielded similar C-statistics of 0.80. Atrial fibrillation (AF), hypertension (HT), and antihypertensive medication were common significant factors, with AF being the most potent factor in both LR and XGBoost models. In contrast, plasma glucose was the strongest predictor in EBM models. Conclusion This study highlights the benefits of using ML approaches for the prediction of stroke in at risk patients. Further external validation of these ML approaches should be conducted before consideration as a tool adopted for routine clinical practice.


 Citation

Please cite as:

Lolak S, Thammasudjarit R, Attia J, McKay GJ, Thakkinstian A

Comparing Explainable Machine Learning Approaches With Traditional Statistical Methods for Evaluating Stroke Risk Models: Retrospective Cohort Study

JMIR Cardio 2023;7:e47736

DOI: 10.2196/47736

PMID: 37494080

PMCID: 10413234

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.