Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR Medical Informatics

Date Submitted: Jul 10, 2019
Date Accepted: Feb 7, 2020

The final, peer-reviewed published version of this preprint can be found here:

Ensemble Learning Models Based on Noninvasive Features for Type 2 Diabetes Screening: Model Development and Validation

Yang T, Zhang L, Yi L, Feng H, Li S, Chen H, Zhu J, Zhao J, Zeng Y, Liu H

Ensemble Learning Models Based on Noninvasive Features for Type 2 Diabetes Screening: Model Development and Validation

JMIR Med Inform 2020;8(6):e15431

DOI: 10.2196/15431

PMID: 32554386

PMCID: 7333074

Ensemble learning models based on non-invasive features for type 2 diabetes screening: a case-control study

  • Tianzhou Yang; 
  • Li Zhang; 
  • Liwei Yi; 
  • Huawei Feng; 
  • Shimeng Li; 
  • Haoyu Chen; 
  • Junfeng Zhu; 
  • Jian Zhao; 
  • Yingyue Zeng; 
  • Hongsheng Liu

ABSTRACT

Background:

Early diabetes screening could effectively reduce the burden of disease. However, a large number of resources are necessary for natural population-based screening projects. In this paper, diabetes prediction models are built for screening in a non-invasive and low-cost manner based on the ensemble learning method.

Objective:

The dataset for building and evaluating the diabetes prediction model was extracted from the National Health and Nutrition Examination Survey (NHANES 2011-2016). After data cleaning and feature selection, the dataset was split into a training set (80%, 2011-2014), test set (20%, 2011-2014) and validation set (2015-2016).

Methods:

Three simple machine learning methods (linear discriminant analysis, support vector machine, and random forest) and the easy ensemble method were used to build diabetes prediction models. Model performance was evaluated through 5-fold cross-validation and external validation. Delong’s test (two-sided) was used to test the performance differences between the models.

Results:

There were 8057 observations and 12 attributes selected from the database. In the 5-fold cross-validation, the three simple methods yielded high predictive performance models with areas under the curve (AUCs) over 0.800, wherein the ensemble models significantly outperformed the simple models. When evaluating the models in the test set and validation set, the same trends were also observed. The ensemble model of linear discriminant analysis yielded the best performance with an AUC of 0.849, an accuracy of 0.730, a sensitivity of 0.819, and a specificity of 0.709 in the validation set.

Conclusions:

The study indicated that efficient screening using machine learning methods with non-invasive tests could be applied to a large population and achieve the secondary prevention objective. Clinical Trial: Null


 Citation

Please cite as:

Yang T, Zhang L, Yi L, Feng H, Li S, Chen H, Zhu J, Zhao J, Zeng Y, Liu H

Ensemble Learning Models Based on Noninvasive Features for Type 2 Diabetes Screening: Model Development and Validation

JMIR Med Inform 2020;8(6):e15431

DOI: 10.2196/15431

PMID: 32554386

PMCID: 7333074

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.