Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: Journal of Medical Internet Research

Date Submitted: May 11, 2020
Date Accepted: Dec 19, 2020

The final, peer-reviewed published version of this preprint can be found here:

Risk Stratification for Early Detection of Diabetes and Hypertension in Resource-Limited Settings: Machine Learning Analysis

Boutilier JJ, Chan TC, Ranjan M, Deo S

Risk Stratification for Early Detection of Diabetes and Hypertension in Resource-Limited Settings: Machine Learning Analysis

J Med Internet Res 2021;23(1):e20123

DOI: 10.2196/20123

PMID: 33475518

PMCID: 7862003

Machine learning-based risk stratification for early detection of diabetes and hypertension in resource-limited settings

  • Justin J. Boutilier; 
  • Timothy C.Y. Chan; 
  • Manish Ranjan; 
  • Sarang Deo

ABSTRACT

Background:

The impending scale up of non-communicable disease screening programs in low-and-middle income countries coupled with limited health resources require that such programs be as accurate as possible at identifying high-risk patients.

Objective:

To develop machine learning-based risk stratification algorithms for diabetes and hypertension that are tailored for the at-risk population served by community-based screening programs in low-resource settings.

Methods:

We train and test our models using data from 2278 patients collected by community health workers through door-to-door and camp-based screenings in the urban slums of Hyderabad, India between July 14, 2015 and April 21, 2018. We determined the best model for predicting short-term (2-month) risk of diabetes and hypertension, respectively, and compared those models to previously developed risk scores from the USA and UK using prediction accuracy as characterized by the area under the receiver operating curve (AUC) and the number of false negatives.

Results:

We found that a random forest model had the highest prediction accuracy for both diseases and was able to outperform the USA and UK risk scores in terms of AUC by 35.5% for diabetes and 13.5% for hypertension. For a fixed screening specificity of 0.9, the random forest model was able to reduce the expected number of false negatives by 620 patients per 1000 screenings for diabetes and 220 patients per 1000 screenings for hypertension. This improvement reduces the cost of incorrect risk stratification by $1.99 USD (or 35%) per screening for diabetes and $1.60 USD (or 21%) per screening for hypertension.

Conclusions:

In the next decade, health systems in many countries are planning to spend significant resources on non-communicable disease screening programs and our study demonstrates that machine learning models can be leveraged by these programs to effectively utilize limited resources by improving risk stratification.


 Citation

Please cite as:

Boutilier JJ, Chan TC, Ranjan M, Deo S

Risk Stratification for Early Detection of Diabetes and Hypertension in Resource-Limited Settings: Machine Learning Analysis

J Med Internet Res 2021;23(1):e20123

DOI: 10.2196/20123

PMID: 33475518

PMCID: 7862003

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.