Accepted for/Published in: Journal of Medical Internet Research
Date Submitted: Mar 29, 2023
Date Accepted: Jun 12, 2024
Date Submitted to PubMed: Jun 13, 2024
Pitfalls in Developing Machine Learning Models for Predicting Cardiovascular Diseases: Challenge and Solutions
ABSTRACT
In recent years, there has been an explosive development of artificial intelligence (AI), which has been widely applied in the healthcare field. As a typical AI technology, machine learning (ML) models have emerged as great potential in predicting cardiovascular diseases (CVDs) by leveraging large amounts of medical data for training and optimization, which are expected to play a crucial role in reducing the incidence and mortality rates of CVDs. Although the field has become a research hotspot, there are still many pitfalls that researchers need to pay close attention to. These pitfalls may affect the predictive performance, credibility, reliability, reproducibility of the studied models, ultimately reducing the value of the research and affecting the prospects for clinical application. Therefore, identifying and avoiding these pitfalls is a crucial task before implementing the research. However, there is currently a lack of comprehensive summary on this topic. This viewpoint aims to analyze the existing problems in terms of data quality, dataset characteristics, model design and statistical methods as well as clinic implication, and provide possible solutions to these problems, like gathering objective data, improving training, repeating measurements, increasing sample size, preventing overfitting using statistical methods, utilizing specific AI algorithms to address targeted issues, standardizing outcomes and evaluation criteria, as well as enhancing fairness and replicability, with the goal of offering reference and assistance to researchers, algorithm developers, policy makers, and clinical practitioners.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.