Accepted for/Published in: JMIR Medical Informatics
Date Submitted: Sep 10, 2024
Date Accepted: Apr 26, 2025
A Responsible Framework for Assessing, Selecting, and Explaining Machine Learning Models in Cardiovascular Disease Outcomes Among People with Type 2 Diabetes: Methodology and Validation
ABSTRACT
Background:
Building machine learning models that are interpretable, explainable, and fair critical for ensuring the trustworthiness of machine learning models in clinical practice. Yet, most model development, selection, and validation frameworks often focus primarily on maximizing predictive accuracy.
Objective:
This study proposes a responsible framework to assess, select, and explain interpretable machine learning models through predictive performance and fairness. We then demonstrate the practicality of this framework in an application to assessing myocardial infarction (MI) and stroke risks among people with type-2 diabetes (T2D).
Methods:
We extracted participant data from the ACCORD dataset (N=9,635), including demographic, clinical, and biomarker records. We applied hold-out cross-validation to develop several interpretable machine learning models (linear, tree-based, and ensemble) to predict the risks of MI and stroke among patients with diabetes. Finally, we proposed model selection criteria based on machine learning models’ predictive performance and fairness, along with a unified model explanation approach to investigate the relationship between features and model outputs.
Results:
Our proposed framework demonstrates that the GLMnet model offers the best balance between predictive performance and fairness for both MI and stroke. For MI, GLMnet achieves the highest RPPS (0.979 for gender and 0.967 for race), indicating minimal performance disparities, while maintaining a high AUC of 0.705. For stroke, GLMnet has a relatively high AUC of 0.705 and the second-highest RPPS (0.961 for gender and 0.979 for race), suggesting it is effective across both subgroups. Our model explanation method further highlights that history of cardiovascular disease and age are the key predictors of MI, while HbA1c and Systolic Blood Pressure (sbp) significantly influence stroke classification.
Conclusions:
This study establishes a responsible framework for deploying interpretable machine learning models in healthcare and provides key insights. First, simple models perform comparably to complex ensembles; second, models exhibiting strong overall predictive accuracy may harbor substantial gender or racial biases; and third, explainability methods effectively reveal the relationships between features and MI and stroke risks. In summary, our results underscore the need for holistic approaches that consider accuracy, fairness, and explainability in interpretable model design, selection, and operation, potentially enhancing healthcare technology adoption.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.