Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR Medical Informatics

Date Submitted: May 8, 2021
Date Accepted: Sep 17, 2021

The final, peer-reviewed published version of this preprint can be found here:

Accurate Prediction of Stroke for Hypertensive Patients Based on Medical Big Data and Machine Learning Algorithms: Retrospective Study

Yang Y, Zheng J, Du Z, Li Y, Cai Y

Accurate Prediction of Stroke for Hypertensive Patients Based on Medical Big Data and Machine Learning Algorithms: Retrospective Study

JMIR Med Inform 2021;9(11):e30277

DOI: 10.2196/30277

PMID: 34757322

PMCID: 8663532

Accurate Prediction of Stroke for Hypertensive Patients Based on Medical Big Data and Machine Learning Algorithms: A Retrospective Study

  • Yujie Yang; 
  • Jing Zheng; 
  • Zhenzhen Du; 
  • Ye Li; 
  • Yunpeng Cai

ABSTRACT

Background:

Stroke risk assessment is an importance means of primary prevention, but the applicability of existing stroke risk assessment scales in Chinese population is still controversial. Prospective study is a common method of medical research, but it is time-consuming and labor-intensive. Medical big data has been demonstrated to promote discovery of disease risk factors and prognosis, and attracts broad research interests.

Objective:

We aimed to establish a high-precision stroke risk prediction model for hypertensive patients through historical stock electronic medical records and machine learning algorithms.

Methods:

Based on Shen Health Information Big Data Platform, a total number of 57,671 patients were screened from 250,788 registered hypertensive patients, of whom 9,421 had stroke onset after three years of follow-up. In addition to baseline features and historical symptoms, we constructed several trend characteristics from multi-temporal medical records. Stratified sampling was implemented according to gender ratio and age stratification to balance positive and negative cases, and then 19,953 samples were randomly divided into training set and test set according to a ratio of 7:3. Four machine learning methods were adopted for modeling, and risk performance was compared with several traditional risk scales. We also analyzed the non-linear effects of continuous features on stroke onset.

Results:

The integrated tree-based XGBoost achieved better performance with area under the receiver operating characteristic curve (AUC) of 0.9220, surpassing the other three traditional machine learning methods. Comparison with two traditional risk scales, the Framingham stroke risk profiles and the Chinese Multi-provincial Cohort Study, our proposed model achieved higher performance on an independent validation set, and AUC increased by 0.17. Further analysis of non-linear effects reveals the importance of multi-temporal trend characteristics for stroke risk prediction, which is beneficial to the standardized management of hypertensive patients.

Conclusions:

A high-precision three-year stroke risk prediction model for hypertensive patients was established, and verified the model performance over traditional risk scales. Multi-temporal trend characteristics play an important role in stroke onset, and then the model could be deployed to electronic health record systems to assist in more pervasive, preemptive screening of stroke risk, enabling higher efficiency of early disease prevention and intervention.


 Citation

Please cite as:

Yang Y, Zheng J, Du Z, Li Y, Cai Y

Accurate Prediction of Stroke for Hypertensive Patients Based on Medical Big Data and Machine Learning Algorithms: Retrospective Study

JMIR Med Inform 2021;9(11):e30277

DOI: 10.2196/30277

PMID: 34757322

PMCID: 8663532

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.