Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.
Predictors of glycaemic response to sulphonylurea therapy in type 2 diabetes: A comparative analysis of linear regression and machine learning models
ABSTRACT
Background:
Sulphonylureas are commonly prescribed for managing type 2 diabetes, yet treatment responses vary significantly among individuals. Although advances in machine learning (ML) may enhance predictive capabilities compared to traditional statistical methods, their practical utility in real-world clinical environments remains uncertain.
Objective:
This study aimed to evaluate and compare the predictive performance of linear regression models with several ML approaches for predicting glycaemic response to sulphonylurea therapy using routine clinical data.
Methods:
A cohort of 7,557 individuals with type 2 diabetes who initiated sulphonylurea therapy was analysed, with all patients followed for one year. Linear and logistic regression models were used as baseline comparisons. A range of ML models was trained to predict the continuous change in HbA1c levels and the achievement of HbA1c <58 mmol/mol at follow-up. These models included Random Forest, XGBoost, Support Vector Machines (SVM), a conventional feedforward neural network (NN), and Bayesian Additive Regression Trees (BART). Model performance was assessed using standard metrics including R² and RMSE for regression tasks, and AUROC for classification.
Results:
All models exhibited similar performance, with no significant advantages of ML techniques over linear regression. For continuous outcomes, BART demonstrated the highest R² (0.445) and lowest RMSE (0.105), though differences among models were minimal. For the binary outcome, XGBoost achieved the highest AUC (0.712), with confidence intervals overlapping those of other models. Across all models, baseline HbA1c was consistently the primary predictor, explaining the majority of the variance. Sensitivity analyses and hyperparameter tuning did not significantly improve model performance.
Conclusions:
The findings suggest that, in this real-world cohort, ML models did not outperform traditional regression in predicting glycaemic response to sulphonylureas. This suggests that for modelling drug response, the limited improvement of machine learning over linear models may reflect a lack of strong non-linear effects or interacting predictors in the available clinical data, making it difficult for ML approaches to outperform logistic or linear regression. It is also possible that the clinical features used may not capture sufficient biological heterogeneity to leverage the strengths of complex modelling techniques.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.