Accepted for/Published in: JMIR Medical Informatics
Date Submitted: Oct 7, 2019
Date Accepted: Oct 24, 2020
Comparison of multivariable logistic regression and other machine learning algorithms for prognostic prediction studies in pregnancy care: systematic review and meta-analysis
ABSTRACT
Background:
Predictions in pregnancy care are complex and long-term tasks that can be resolved by machine learning algorithms. However, determining how machine learning models can be applied in pregnancy care remains unclear.
Objective:
The objective of this study was to review machine learning models and conduct a meta-analysis of their predictive performances that have been developed and/or validated for prognostic predictions in pregnancy care to inform clinicians’ decision making.
Methods:
Research articles from MEDLINE, EMBASE, Scopus, Web of Science, and Google Scholar were identified, screened, assessed, and included by following PRISMA guidelines. Studies were primarily framed as PICOTS: 1) population: men or women in procreative management, pregnant women, or fetuses/newborns; 2) index: prognostic machine learning for classification; 3) comparator: other machine learning models in each outcome; 4) outcomes: pregnancy-related outcomes of procreation, or pregnancy outcomes for pregnant women or fetuses/newborns; 5) timing: pre-, inter-, or peri-pregnancy periods (predictors); at the pregnancy, delivery, or puerperal/neonatal period (outcome); and short- or long-term prognosis (time interval); and 6) setting: primary care or hospital. We used PROBAST guidelines for study appraisal. Results were synthesized by random-effects modeling with pooled interval estimates of the area under the receiver operating characteristics curve (AUROC; including the 95% prediction interval [PI]), Cochrane’s Q, and I2 statistics.
Results:
Of the 42 included studies, most of them predicted outcomes in the perinatal period (n=22, 52.4%) and prematurity as the cause of neonatal death in pregnancy care (n=12, 28.6%). More than one-third of the studies were at low risk of bias (ROB; n=16, 38.1%). Most of the studies with a high ROB had problems with sufficient events per variable, calibration plots, or bootstrapped cross-validation. Model comparisons were conducted among machine learning models for predicting fetal distress, premature birth, embryo implantation, or central nervous system anomalies. From those in low-ROB studies, the pooled estimate of the AUROC was 0.77 (95% PI 0.35~1.19; n=5; heterogeneity P<.001; I2=99.84%). There was evidence of heterogeneity and inconsistencies because of highly diverse predicted outcomes. More studies are needed to allow a meta-analysis with the same outcome.
Conclusions:
Models developed by machine learning algorithms did not demonstrate a superior prediction performance. Development of prognostic prediction models should address problems in studies with a high ROB.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.