Accepted for/Published in: JMIR Medical Informatics
Date Submitted: Feb 6, 2025
Open Peer Review Period: Feb 6, 2025 - Apr 3, 2025
Date Accepted: Apr 25, 2025
(closed for review but you can still tweet)
Evaluation of Machine Learning Model Performance in a Diabetic Foot Ulcer Retrospective Cohort: A Framework Proposition for Future Research
ABSTRACT
Background:
Machine learning (ML) has shown great potential in recognizing complex disease patterns and supporting clinical decision-making. Diabetic foot ulcers (DFUs) represent a significant multifactorial medical problem with high incidence and severe outcomes, providing an ideal example for a comprehensive framework that encompasses all essential steps for implementing ML in a clinically relevant fashion.
Objective:
This article aims to provide a framework for the proper use of ML algorithms to predict clinical outcomes of multifactorial diseases and their treatments.
Methods:
The comparison of ML models was performed on a DFU dataset. The selection of patient characteristics associated with wound healing was based on outcomes of statistical tests, i.e. ANOVA, chi-squared test and validated on expert recommendations. Imputation and balancing of patient records was performed with Midas Touch and Adaptive Synthetic Sampling (ADASYN), respectively. Logistic Regression (LR), Support Vector Machine (SVM), K-Nearest Neighbors (KNN), Random Forest (RF), Extreme Gradient Boost (XGBoost), Bayesian Additive Regression Trees (BART), and Artificial Neural Networks (ANN) were trained, cross-validated, and optimized using random sampling on the patient dataset.
Results:
The exploratory dataset consisted of 700 patient records with 199 variables. After dataset cleaning, the variables used for model training included age, smoking status, toe systolic pressure, blood pressure, oxygen saturation, hemoglobin, HbA1c, estimated glomerular filtration rate, wound location, diabetes type, Texas wound classification, neuropathy, and wound area measurement. The SVM obtained a stable accuracy of 0.853 (95% CI 0.789-0.917) with an AUC score of 0.922 (95% CI 0.872-0.972). The RF and XGBoost acquired an accuracy of 0.838 (95% CI 0.770-0.905) and 0.815 (95% CI 0.745-0.885) respectively, with AUC scores of 0.917 (95% CI 0.866-0.969) for RF and 0.889 (95% CI 0.829-0.948) for XGBoost.
Conclusions:
Handling missing values, feature selection, and addressing class imbalance are critical components of the key steps in developing ML applications for clinical research. Seven models were selected for comparing their predictive power regarding complete wound healing, each model representing a different branch in ML. In this initial DFU dataset used as an example, the SVM achieved the best performance in predicting clinical outcomes, followed by RF and XGBoost.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.