Accepted for/Published in: JMIR Medical Informatics
Date Submitted: Jul 10, 2019
Date Accepted: Feb 7, 2020
Ensemble learning models based on non-invasive features for type 2 diabetes screening: a case-control study
ABSTRACT
Background:
Early diabetes screening could effectively reduce the burden of disease. However, a large number of resources are necessary for natural population-based screening projects. In this paper, diabetes prediction models are built for screening in a non-invasive and low-cost manner based on the ensemble learning method.
Objective:
The dataset for building and evaluating the diabetes prediction model was extracted from the National Health and Nutrition Examination Survey (NHANES 2011-2016). After data cleaning and feature selection, the dataset was split into a training set (80%, 2011-2014), test set (20%, 2011-2014) and validation set (2015-2016).
Methods:
Three simple machine learning methods (linear discriminant analysis, support vector machine, and random forest) and the easy ensemble method were used to build diabetes prediction models. Model performance was evaluated through 5-fold cross-validation and external validation. Delong’s test (two-sided) was used to test the performance differences between the models.
Results:
There were 8057 observations and 12 attributes selected from the database. In the 5-fold cross-validation, the three simple methods yielded high predictive performance models with areas under the curve (AUCs) over 0.800, wherein the ensemble models significantly outperformed the simple models. When evaluating the models in the test set and validation set, the same trends were also observed. The ensemble model of linear discriminant analysis yielded the best performance with an AUC of 0.849, an accuracy of 0.730, a sensitivity of 0.819, and a specificity of 0.709 in the validation set.
Conclusions:
The study indicated that efficient screening using machine learning methods with non-invasive tests could be applied to a large population and achieve the secondary prevention objective. Clinical Trial: Null
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.