Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR Cardio

Date Submitted: Mar 16, 2022
Open Peer Review Period: Mar 16, 2022 - May 11, 2022
Date Accepted: Aug 9, 2022
(closed for review but you can still tweet)

The final, peer-reviewed published version of this preprint can be found here:

The Impact of Time Horizon on Classification Accuracy: Application of Machine Learning to Prediction of Incident Coronary Heart Disease

Simon S, Mandair D, Albakri A, Fohner A, Simon N, Lange L, Biggs M, Mukamal K, Psaty B, Rosenberg M

The Impact of Time Horizon on Classification Accuracy: Application of Machine Learning to Prediction of Incident Coronary Heart Disease

JMIR Cardio 2022;6(2):e38040

DOI: 10.2196/38040

PMID: 36322114

PMCID: 9669890

The Impact of Time Horizon on Classification Accuracy: Application of Machine Learning to Prediction of Incident Coronary Heart Disease

  • Steven Simon; 
  • Divneet Mandair; 
  • Abdel Albakri; 
  • Alison Fohner; 
  • Noah Simon; 
  • Leslie Lange; 
  • Mary Biggs; 
  • Ken Mukamal; 
  • Bruce Psaty; 
  • Michael Rosenberg

ABSTRACT

Background:

Many machine-learning (ML) approaches are limited to classification of outcomes, rather than longitudinal prediction. One strategy to use ML in clinical risk prediction is to classify outcomes over a given time horizon. However, it is not well-known how to identify the optimal time horizon for risk prediction.

Objective:

Here we aim to identify an optimal time horizon for classification of incident myocardial infarction using ML approaches looped over outcomes with increasing time horizons.

Methods:

We analyzed data from a single clinic visit of 5201 participants of the Cardiovascular Health Study. We examined 61 variables collected from this baseline exam including demographic and biologic data, medical history, medications, serum biomarkers, electrocardiographic, and echocardiographic data. We compared several machine learning methods (Random Forest, L1 Regression, Gradient Boosted Decision Tree, Support Vector Machines, and K-Nearest Neighbor) trained to predict incident MI that occurred within time horizons ranging from 500 through 10000 days of follow up. Models were compared on a 20% held-out testing set using area-under-receiver operator curve (AUC). Variable importance was performed for Random Forest and L1 Regression models across timepoints. We compared results with the Framingham coronary heart disease sex-specific Cox proportional hazards regression functions.

Results:

There were 4190 participants included in the analysis with 60.2% female and an average age of 72.6 years. Over the 10000 days of follow up, there were 813 incident myocardial infarction events. The ML models were most predictive over moderate follow up time horizons (1500-2500 days). Overall, the L1 (Lasso) logistic regression demonstrated the strongest classification accuracy across all time horizons. This model was most predictive at 1500 days follow up with an AUC of 0.71. The most influential variables differed by follow up time and model with gender being the most important feature for the L1 regression and weight for the random forest across all timeframes. Compared with the Framingham Cox function, the L1 and random forest models performed better across all timeframes beyond 1500 days.

Conclusions:

In a population free of coronary heart disease, machine learning techniques can be utilized to predict incident myocardial infarction at varying time horizons with reasonable accuracy, with strongest prediction accuracy at moderate follow up periods. Validation across additional populations is needed to confirm a role for this approach in risk prediction.


 Citation

Please cite as:

Simon S, Mandair D, Albakri A, Fohner A, Simon N, Lange L, Biggs M, Mukamal K, Psaty B, Rosenberg M

The Impact of Time Horizon on Classification Accuracy: Application of Machine Learning to Prediction of Incident Coronary Heart Disease

JMIR Cardio 2022;6(2):e38040

DOI: 10.2196/38040

PMID: 36322114

PMCID: 9669890

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.