Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: Journal of Medical Internet Research

Date Submitted: Mar 29, 2023
Date Accepted: Jun 12, 2024
Date Submitted to PubMed: Jun 13, 2024

The final, peer-reviewed published version of this preprint can be found here:

Pitfalls in Developing Machine Learning Models for Predicting Cardiovascular Diseases: Challenge and Solutions

Cai YQ, Cai Y, Tang LY, Jing TC, Gong M, Li HJ, Hu W, Zhang XG, Gong DX, Zhang GW

Pitfalls in Developing Machine Learning Models for Predicting Cardiovascular Diseases: Challenge and Solutions

J Med Internet Res 2024;26:e47645

DOI: 10.2196/47645

PMID: 38869157

PMCID: 11316160

Pitfalls in Developing Machine Learning Models for Predicting Cardiovascular Diseases: Challenge and Solutions

  • Yu-Qing Cai; 
  • Yue Cai; 
  • Li-Ying Tang; 
  • Tian-Ci Jing; 
  • Mengchun Gong; 
  • Hui-Jun Li; 
  • Wei Hu; 
  • Xin-Gang Zhang; 
  • Da-Xin Gong; 
  • Guang-Wei Zhang

ABSTRACT

In recent years, there has been an explosive development of artificial intelligence (AI), which has been widely applied in the healthcare field. As a typical AI technology, machine learning (ML) models have emerged as great potential in predicting cardiovascular diseases (CVDs) by leveraging large amounts of medical data for training and optimization, which are expected to play a crucial role in reducing the incidence and mortality rates of CVDs. Although the field has become a research hotspot, there are still many pitfalls that researchers need to pay close attention to. These pitfalls may affect the predictive performance, credibility, reliability, reproducibility of the studied models, ultimately reducing the value of the research and affecting the prospects for clinical application. Therefore, identifying and avoiding these pitfalls is a crucial task before implementing the research. However, there is currently a lack of comprehensive summary on this topic. This viewpoint aims to analyze the existing problems in terms of data quality, dataset characteristics, model design and statistical methods as well as clinic implication, and provide possible solutions to these problems, like gathering objective data, improving training, repeating measurements, increasing sample size, preventing overfitting using statistical methods, utilizing specific AI algorithms to address targeted issues, standardizing outcomes and evaluation criteria, as well as enhancing fairness and replicability, with the goal of offering reference and assistance to researchers, algorithm developers, policy makers, and clinical practitioners.


 Citation

Please cite as:

Cai YQ, Cai Y, Tang LY, Jing TC, Gong M, Li HJ, Hu W, Zhang XG, Gong DX, Zhang GW

Pitfalls in Developing Machine Learning Models for Predicting Cardiovascular Diseases: Challenge and Solutions

J Med Internet Res 2024;26:e47645

DOI: 10.2196/47645

PMID: 38869157

PMCID: 11316160

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.