Accepted for/Published in: JMIR Medical Informatics
Date Submitted: Mar 6, 2024
Date Accepted: Dec 3, 2024
A machine learning algorithm for predicting diabetes retinopathy in patients with type 2 diabetes: Derivation and validation in two independent cohorts in South Korea
ABSTRACT
Background:
Diabetic retinopathy (DR) is the leading cause of preventable blindness worldwide. Machine learning (ML) systems show potential to enhance DR in community-based screening. However, predictive power models assessing their usability and performance are scarce.
Objective:
This study used data from three university hospitals in Korea to provide a simple and accurate assessment of ML-based risk prediction for DR development, which can be universally applied to adults with type 2 diabetes mellitus (T2DM).
Methods:
This study predicted DR using data from independent electronic medical record-based cohorts; namely, a discovery cohort (one hospital, n=68,009) and a validation cohort (two hospitals, n=18,895). The primary outcome was the presence or absence of DR at three years. Different ML-based models were selected through hyperparameter tuning in the discovery cohort and analyzed the area under the receiver operating characteristic curve in the validation cohort.
Results:
Among 68,009 patients screened for inclusion, 14,694 (21.61%) were eligible for study analysis, and 348 (2.37%) patients were referred for DR. For DR, the XGBoost system had an accuracy of 73.10% (95% confidence interval [CI], 71.27–74.93), with a sensitivity of 72.71% (71.03–74.39) and a specificity of 73.11% (71.27–74.94) in the original dataset. Among the validation data set, XGBoost had an accuracy of 66.86%, a sensitivity of 67.15%, and a specificity of 66.84%. The most common feature in the XGBoost model was dyslipidemia, followed by cancer, hypertension, chronic kidney disease, neuropathy, and cardiovascular disease.
Conclusions:
Among 68,009 patients screened for inclusion, 14,694 (21.61%) were eligible for study analysis, and 348 (2.37%) patients were referred for DR. For DR, the XGBoost system had an accuracy of 73.10% (95% confidence interval [CI], 71.27–74.93), with a sensitivity of 72.71% (71.03–74.39) and a specificity of 73.11% (71.27–74.94) in the original dataset. Among the validation data set, XGBoost had an accuracy of 66.86%, a sensitivity of 67.15%, and a specificity of 66.84%. The most common feature in the XGBoost model was dyslipidemia, followed by cancer, hypertension, chronic kidney disease, neuropathy, and cardiovascular disease.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.