Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.
Who will be affected?
Readers: No access to all 28 journals. We recommend accessing our articles via PubMed Central
Authors: No access to the submission form or your user account.
Reviewers: No access to your user account. Please download manuscripts you are reviewing for offline reading before Wednesday, July 01, 2020 at 7:00 PM.
Editors: No access to your user account to assign reviewers or make decisions.
Copyeditors: No access to user account. Please download manuscripts you are copyediting before Wednesday, July 01, 2020 at 7:00 PM.
Beyond Comparing Machine Learning and Logistic Regression in Clinical Prediction Modelling: Shifting from Model Debate to Data Quality
Yanan Hu;
Xin Zhang;
Valerie Slavin;
Yitayeh Belsti;
Sofonyas Abebaw Tiruneh;
Emily Callander;
Joanne Enticott
ABSTRACT
The rapid uptake of supervised machine learning (ML) in clinical prediction modelling, particularly for binary outcomes based on tabular data, has sparked debate about its comparative advantage over traditional statistical logistic regression (LR). While ML has demonstrated superiority in unstructured data domains, its performance gains in structured, tabular clinical datasets remain inconsistent and context dependent. This viewpoint synthesises recent comparative studies and simulation findings to argue that there is no universal best modelling approach. Model performance depends heavily on dataset characteristics (e.g. linearity, sample size, number of candidate predictors, and minority class proportion) and data quality (e.g. completeness and accuracy). Consequently, we argue that efforts to improve data quality, not model complexity, are more likely to enhance the reliability and real-world utility of clinical prediction models.
Citation
Please cite as:
Hu Y, Zhang X, Slavin V, Belsti Y, Tiruneh SA, Callander E, Enticott J
Beyond Comparing Machine Learning and Logistic Regression in Clinical Prediction Modelling: Shifting from Model Debate to Data Quality