Accepted for/Published in: JMIR Medical Informatics
Date Submitted: Oct 14, 2020
Date Accepted: Dec 5, 2020
Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.
Mortality Prediction of Patients with Cardiovascular Disease Using Medical Claims Data under Artificial Intelligence Architectures: Validation Study
ABSTRACT
Background:
As stated by WHO, Cardiovascular disease (CVDs) are the number 1 cause of death globally, which means more people die annually from CVDs than from any other cause. An estimated 17.9 million people died from CVDs in 2016, representing 31% of all global deaths. Of these deaths, 85% are due to heart attack and stroke. In this study, we present a benchmark comparison of various Artificial Intelligence (AI) architectures on predicting mortality of CVD patients using the structured medical claims data.
Objective:
This study mainly aims to support health clinicians to accurately predict mortality among patients with CVD using only claims data before a clinic visit.
Methods:
The used dataset was joined from Medical Benefits Scheme (MBS) and Pharmaceutical Benefits Scheme (PBS) service information in the period between 2004 and 2014, released by the Department of Health Australia in 2016. It includes 346,201 records corresponding to 346,201 patients. A total of five AI algorithms including four classical Machine Learning (ML) algorithms (Logistic Regression (LR), Random Forest (RF), Extra Trees (ET) and Gradient Boosting Trees (GBT)) and a deep learning algorithm which is a densely connected neural network (DNN) were developed and compared in the study. In addition, due to the minority of ‘deceased’ patients in the data set, a separate experiment using Synthetic Minority Oversampling Technique (SMOTE) was conducted to enrich the data.
Results:
Regarding model performance, in terms of discrimination, GBT and RF are the models with highest AUROC (97.8% and 97.7% respectively), followed by ET (96.8%) and LG (96.4%) while DNN is the least discriminative (95.3%). In terms of reliability, LG predictions are the least calibrated compared to those of four algorithms. In this study, despite increasing training time, SMOTE is proved to further improve model performance of LG while other algorithms, especially GBT and DNN, work well with class imbalanced data.
Conclusions:
Compared to other research in the clinical literature involving AI models using claims data to predict patient health outcomes, our models are more efficient since we only utilize a smaller number of features but still achieve high performance. And this study could support health professionals to accurately choose AI models to predict mortality among patients with CVD using only claims data before a clinic visit.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.