Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Currently submitted to: JMIR Medical Informatics

Date Submitted: Mar 22, 2026
Open Peer Review Period: Apr 2, 2026 - May 28, 2026
(currently open for review)

Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.

Development of a Personal Health Management Service Using Clinical Data Warehouse Data: An Algorithm for Chronic Disease Prediction

  • Seol Whan Oh; 
  • kihoon Kim; 
  • Yoon-Hee Choi; 
  • In Young Choi

ABSTRACT

Background:

The increasing burden of chronic diseases such as hypertension and diabetes necessitates a shift from reactive to proactive preventive care. This transition is now feasible through the convergence of large-scale health data, machine learning (ML), and patient-centered policies, such as South Korea’s MyData initiative.

Objective:

The objective of this study was to develop and validate ML models using routine health-screening data to predict the onset of hypertension and diabetes, thereby providing an evidence-based foundation for personalized, data-driven prevention.

Methods:

We constructed a cohort using data from the Clinical Data Warehouse (CDW) of Seoul St. Mary’s Hospital. Two distinct datasets were analyzed: 21,589 individuals for essential hypertension prediction and 22,255 individuals for type 2 diabetes mellitus prediction. Five ML models were used to classify disease onset. The final models were selected based on a comprehensive evaluation of the area under the receiver operating characteristic curve (AUROC) and the F1 score. Finally, the importance of variables in the selected models was confirmed using Shapley Additive Explanation (SHAP) values.

Results:

Among the models tested, logistic regression was used to predict essential hypertension and type 2 diabetes mellitus. The models demonstrated high predictive performance, with an AUROC of 0.842 for hypertension and 0.954 for diabetes. SHAP analysis revealed that age was the most influential predictor of hypertension, whereas HbA1c was the most significant predictor of diabetes.

Conclusions:

We successfully developed prediction models for hypertension and diabetes that are applicable within MyData services. These models have the potential to empower individuals in data-driven self-management and to enhance personalized disease prevention.


 Citation

Please cite as:

Oh SW, Kim k, Choi YH, Choi IY

Development of a Personal Health Management Service Using Clinical Data Warehouse Data: An Algorithm for Chronic Disease Prediction

JMIR Preprints. 22/03/2026:95779

DOI: 10.2196/preprints.95779

URL: https://preprints.jmir.org/preprint/95779

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.