Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR Medical Informatics

Date Submitted: Mar 29, 2020
Date Accepted: Jun 4, 2020

The final, peer-reviewed published version of this preprint can be found here:

Predicting Current Glycated Hemoglobin Levels in Adults From Electronic Health Records: Validation of Multiple Logistic Regression Algorithm

Alhassan Z, Budgen D, Alshammari R, Al Moubayed N

Predicting Current Glycated Hemoglobin Levels in Adults From Electronic Health Records: Validation of Multiple Logistic Regression Algorithm

JMIR Med Inform 2020;8(7):e18963

DOI: 10.2196/18963

PMID: 32618575

PMCID: 7367516

Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.

Replication Study: Use of Multiple Logistic Regression to Predict Current Glycated Hemoglobin Values in Adults.

  • Zakhriya Alhassan; 
  • David Budgen; 
  • Riyad Alshammari; 
  • Noura Al Moubayed

ABSTRACT

Background:

Electronic Health Record (EHR) systems generate large datasets that can significantly enrich the development of medical predictive models. Several attempts have been made to investigate the effect of Glycated Hemoglobin (HbA1c) elevation on the prediction of diabetes onset. However, there is still a need for validation of these models using EHR data collected from different populations.

Objective:

The aim of this study is to perform a replication study to validate, evaluate and identify the strengths and weaknesses of replicating a predictive model that employed multiple logistic regression with EHR data to forecast the levels of HbA1c. The original study used data from a population in the USA and this (differentiated) replication used a population in Saudi Arabia.

Methods:

Three models are developed and compared with the model created in the original study. The models are trained and tested using a larger dataset from Saudi Arabia with 36,379 records. The 10-cross validation approach is used for measuring the performance of the models.

Results:

The result of applying the method employed in the original study achieved an accuracy of 73% to 74% when using the dataset collected from Saudi Arabia, compared to 77% obtained from using the population from the USA. The results also show a slightly different ranking of importance for the predictors between the original study and the replication. The order of importance for the predictors with our population, from the most to the least importance is: age, random blood sugar, estimated glomerular filtration rate, total cholesterol, non-high density lipoprotein and body mass index.

Conclusions:

This replication study shows that direct use of the models (calculators) created using multiple logistic regression models to predict the level of HbA1c may not be appropriate for all populations. This study reveals that the predictors weighting needs to be calibrated to the population used. However, the study does confirm that replicating the original study using a different population can help with predicting the levels of HbA1c using the predictors that are routinely collected and stored in hospital EHR systems.


 Citation

Please cite as:

Alhassan Z, Budgen D, Alshammari R, Al Moubayed N

Predicting Current Glycated Hemoglobin Levels in Adults From Electronic Health Records: Validation of Multiple Logistic Regression Algorithm

JMIR Med Inform 2020;8(7):e18963

DOI: 10.2196/18963

PMID: 32618575

PMCID: 7367516

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.