Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Currently submitted to: JMIR Preprints

Date Submitted: Jun 13, 2025

Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.

Applications and methods to develop artificial intelligence-based population-specific risk models for predicting first and recurrent cardio/cerebrovascular events: PowerAI-CVD Showcase

  • Haipeng Liu; 
  • Jeremy Man Ho Hui; 
  • Anselm Au; 
  • Quincy Lee; 
  • Carlin Chang; 
  • Mehrdad Shahmohammadi Beni; 
  • Gary Tse

ABSTRACT

Background:

Our team was the first in Hong Kong to develop machine learning-enhanced risk models for predicting first and recurrent events of cardiovascular disease in predominantly Chinese subjects using territory-wide data from our specific geographical region. Initially >500 risk variables from demographics (age, sex, source of admissions, ethnicity, number of hospitalisations prior to the index date), physiological status (systolic blood pressure [SBP], diastolic blood pressure [DBP], mean blood pressure [MBP], variability of SBP, DBP and MBP), disease diagnoses from 18 systems/organs, laboratory test results (complete blood count, liver and renal function, lipids, glycemic tests), and medications (23 categories) were considered. The PowerAI-CVD model is a simpler model with 19 variables, requiring less computational power but nevertheless exhibiting high discriminative power with a c-statistic of 0.89.

Objective:

Arising from this project was a series of graphical user interface (GUI)-based applications and tools that can be used for longitudinal analysis of routinely collected electronic health records from Hong Kong, which we termed Open-source disease analyzer toolkit (ODAT).

Methods:

ODAT was developed using Python. It is publicly available from this URL: https://odat.info/ and released under GNU GPLv3 on Github (https://github.com/ODAT-Project), which is fully free and open-source for research or commercial use.

Results:

ODAT contains three chapters. Chapter 1: data cleaning, processing and dataset creation. Chapter 2: automating data analysis and risk modelling using traditional Cox and machine learning method (XGBoost, Gradient Boosting, Multilayer Perceptron, Random Forest, Naïve Bayes, Decision Tree, k-Nearest Neighbor, AdaBoost, and SVM-Sigmoid model). Using the top performing machine learning model as a showcase (XGBoost), nonlinear terms can be fed into traditional Cox regression models to enhance risk prediction. Chapter 3: graphical outputs of risk outputs over a 1, 3, 5, 10 and 20-year period, and interactive platforms to illustrate how the risk estimates alter after selecting and deselecting treatment options.

Conclusions:

Our tools enable epidemiologists, public health practitioners and researchers to develop risk models with friendly GUIs, starting from database building, to variable selection, and model building.


 Citation

Please cite as:

Liu H, Hui JMH, Au A, Lee Q, Chang C, Beni MS, Tse G

Applications and methods to develop artificial intelligence-based population-specific risk models for predicting first and recurrent cardio/cerebrovascular events: PowerAI-CVD Showcase

JMIR Preprints. 13/06/2025:78954

DOI: 10.2196/preprints.78954

URL: https://preprints.jmir.org/preprint/78954

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.