Accepted for/Published in: Journal of Medical Internet Research
Date Submitted: May 11, 2020
Date Accepted: Dec 19, 2020
Machine learning-based risk stratification for early detection of diabetes and hypertension in resource-limited settings
ABSTRACT
Background:
The impending scale up of non-communicable disease screening programs in low-and-middle income countries coupled with limited health resources require that such programs be as accurate as possible at identifying high-risk patients.
Objective:
To develop machine learning-based risk stratification algorithms for diabetes and hypertension that are tailored for the at-risk population served by community-based screening programs in low-resource settings.
Methods:
We train and test our models using data from 2278 patients collected by community health workers through door-to-door and camp-based screenings in the urban slums of Hyderabad, India between July 14, 2015 and April 21, 2018. We determined the best model for predicting short-term (2-month) risk of diabetes and hypertension, respectively, and compared those models to previously developed risk scores from the USA and UK using prediction accuracy as characterized by the area under the receiver operating curve (AUC) and the number of false negatives.
Results:
We found that a random forest model had the highest prediction accuracy for both diseases and was able to outperform the USA and UK risk scores in terms of AUC by 35.5% for diabetes and 13.5% for hypertension. For a fixed screening specificity of 0.9, the random forest model was able to reduce the expected number of false negatives by 620 patients per 1000 screenings for diabetes and 220 patients per 1000 screenings for hypertension. This improvement reduces the cost of incorrect risk stratification by $1.99 USD (or 35%) per screening for diabetes and $1.60 USD (or 21%) per screening for hypertension.
Conclusions:
In the next decade, health systems in many countries are planning to spend significant resources on non-communicable disease screening programs and our study demonstrates that machine learning models can be leveraged by these programs to effectively utilize limited resources by improving risk stratification.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.