Currently accepted at: JMIR AI
Date Submitted: Nov 19, 2025
Open Peer Review Period: Dec 3, 2025 - Jan 28, 2026
Date Accepted: Mar 4, 2026
(closed for review but you can still tweet)
This paper has been accepted and is currently in production.
It will appear shortly on 10.2196/87819
The final accepted version (not copyedited yet) is in this tab.
Applications of AutoML in Diabetes Risk Prediction: A Rapid Review of Methodological Approaches and Reported Performance (2015–2025
ABSTRACT
Background:
Type 2 diabetes (T2D) is a complex chronic condition that imposes a substantial burden on healthcare systems. Prevention and early detection are critical to mitigating its impact. Automated machine learning (AutoML) models have the potential to predict individual risk and guide personalized interventions. However, their clinical deployment remains limited due to the retrospective nature of most datasets, lack of external validation, and heterogeneity in variable selection
Objective:
To map AutoML approaches applied to T2D risk prediction, with a specific focus on their ability to integrate clinical, behavioral, environmental, and genomic data.
Methods:
A PRISMA-guided rapid review was conducted across six databases (PubMed, Scopus, Web of Science, IEEE Xplore, Google Scholar, and EMBASE) to identify empirical studies (2015–2025) that used AutoML tools for T2D prediction based on at least two data types (e.g., clinical, behavioral, environmental, genomic). Screening, data extraction, and synthesis were performed systematically by two independent reviewers, with arbitration by a third AI reviewer (ChatGPT).
Results:
Thirteen studies met inclusion criteria. Methodological diversity ranged from conventional machine learning with manual feature selection to partially or fully automated pipelines using tools such as TPOT, H2O AutoML, or Azure ML. Reported performance varied (AUC 0.75–0.99), but external validation was uncommon. Behavioral and environmental data were only partially integrated, and no study incorporated genomic data despite its recognized potential. Most studies lacked transparency and reproducibility, with no public code or pipeline sharing
Conclusions:
AutoML holds significant promise for improving T2D risk prediction through automation and model explainability. Yet, to support clinical adoption and generalizability, future AutoML pipelines must be developed using prospective, multicenter datasets, integrate diverse and harmonized data types, including genomics, and adhere to open science principles of transparency, reproducibility, and interpretability
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.