Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR Medical Informatics

Date Submitted: Jun 18, 2025
Date Accepted: Oct 13, 2025

The final, peer-reviewed published version of this preprint can be found here:

Methods for Addressing Missingness in Electronic Health Record Data for Clinical Prediction Models: Comparative Evaluation

Digitale J, Franzon D, Pletcher MJ, McCulloch CE, Gennatas ED

Methods for Addressing Missingness in Electronic Health Record Data for Clinical Prediction Models: Comparative Evaluation

JMIR Med Inform 2025;13:e79307

DOI: 10.2196/79307

PMID: 41237368

PMCID: 12617989

Methods for Addressing Missingness in Electronic Health Record Data for Clinical Prediction Models: Comparative Evaluation

  • Jean Digitale; 
  • Deborah Franzon; 
  • Mark J. Pletcher; 
  • Charles E. McCulloch; 
  • Efstathios D. Gennatas

ABSTRACT

Background:

Missing data is a common challenge in EHR-based prediction modeling. Traditional imputation methods may not suit prediction or machine learning models, and real-world use requires workflows that are implementable for both model development and real-time prediction.

Objective:

We evaluated methods for handling missing data when using EHR data to build clinical prediction models in pediatric intensive care unit (PICU) patients.

Methods:

Using EHR data containing missing values from an academic medical center PICU, we generated a synthetic complete dataset. From this, we created 300 datasets with missing data under varying mechanisms and proportions of missingness for the outcomes of 1) successful extubation (binary) and 2) blood pressure (continuous). We assessed strategies to address missing data including simple methods (e.g., last observation carried forward [LOCF]), complex methods (e.g., random forest multiple imputation), and native support for missing values in outcome prediction models.

Results:

Across 886 patients and 1,220 intubation events, 18.2% of original data were missing. LOCF had the lowest imputation error, followed by random forest imputation (average mean squared error [MSE] improvement over mean imputation: 0.41 [range: 0.30, 0.50] and 0.33 [0.21, 0.43], respectively). LOCF generally outperformed other imputation methods across outcome metrics and models (mean improvement: 1.28% [range: -0.07%, 7.2%]). Imputation methods showed more performance variability for the binary outcome (balanced accuracy coefficient of variation [CV]: 0.042) than the continuous outcome (MSE CV: 0.001).

Conclusions:

Traditional imputation methods for inferential statistics, such multiple imputation, may not be optimal for prediction models. Amount of missingness influenced performance more than missingness mechanism. In datasets with frequent measurements, LOCF and native support for missing values in machine learning models offer reasonable performance for handing missingness at minimal computational cost in predictive analyses.


 Citation

Please cite as:

Digitale J, Franzon D, Pletcher MJ, McCulloch CE, Gennatas ED

Methods for Addressing Missingness in Electronic Health Record Data for Clinical Prediction Models: Comparative Evaluation

JMIR Med Inform 2025;13:e79307

DOI: 10.2196/79307

PMID: 41237368

PMCID: 12617989

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.