JMIR Preprints #47645: Pitfalls in Developing Machine Learning Models for Predicting Cardiovascular Diseases: Challenge and Solutions

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

Pitfalls in Developing Machine Learning Models for Predicting Cardiovascular Diseases: Challenge and Solutions

Yu-Qing Cai;
Yue Cai;
Li-Ying Tang;
Tian-Ci Jing;
Mengchun Gong;
Hui-Jun Li;
Wei Hu;
Xin-Gang Zhang;
Da-Xin Gong;
Guang-Wei Zhang

ABSTRACT

In recent years, there has been an explosive development of artificial intelligence (AI), which has been widely applied in the healthcare field. As a typical AI technology, machine learning (ML) models have emerged as great potential in predicting cardiovascular diseases (CVDs) by leveraging large amounts of medical data for training and optimization, which are expected to play a crucial role in reducing the incidence and mortality rates of CVDs. Although the field has become a research hotspot, there are still many pitfalls that researchers need to pay close attention to. These pitfalls may affect the predictive performance, credibility, reliability, reproducibility of the studied models, ultimately reducing the value of the research and affecting the prospects for clinical application. Therefore, identifying and avoiding these pitfalls is a crucial task before implementing the research. However, there is currently a lack of comprehensive summary on this topic. This viewpoint aims to analyze the existing problems in terms of data quality, dataset characteristics, model design and statistical methods as well as clinic implication, and provide possible solutions to these problems, like gathering objective data, improving training, repeating measurements, increasing sample size, preventing overfitting using statistical methods, utilizing specific AI algorithms to address targeted issues, standardizing outcomes and evaluation criteria, as well as enhancing fairness and replicability, with the goal of offering reference and assistance to researchers, algorithm developers, policy makers, and clinical practitioners.

Citation

Please cite as:

Cai YQ, Cai Y, Tang LY, Jing TC, Gong M, Li HJ, Hu W, Zhang XG, Gong DX, Zhang GW

Pitfalls in Developing Machine Learning Models for Predicting Cardiovascular Diseases: Challenge and Solutions

J Med Internet Res 2024;26:e47645

DOI: 10.2196/47645

PMID: 38869157

PMCID: 11316160

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Accepted for/Published in: Journal of Medical Internet Research

Date Submitted: Mar 29, 2023

Date Accepted: Jun 12, 2024

Date Submitted to PubMed: Jun 13, 2024

Pitfalls in Developing Machine Learning Models for Predicting Cardiovascular Diseases: Challenge and Solutions

ABSTRACT

Citation

Copyright