Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR Medical Informatics

Date Submitted: Oct 14, 2020
Date Accepted: Sep 2, 2021

The final, peer-reviewed published version of this preprint can be found here:

On Missingness Features in Machine Learning Models for Critical Care: Observational Study

Singh J, Sato M, Ohkuma T

On Missingness Features in Machine Learning Models for Critical Care: Observational Study

JMIR Med Inform 2021;9(12):e25022

DOI: 10.2196/25022

PMID: 34889756

PMCID: 8701717

On Missingness Features in Machine Learning Models for Critical Care: Observational Study

  • Janmajay Singh; 
  • Masahiro Sato; 
  • Tomoko Ohkuma

ABSTRACT

Background:

Missing data in Electronic Health Records is inevitable and considered to be non-random. Several studies have found that features indicating missing patterns (missingness) encode useful information about a patient's health and advocate their inclusion in clinical prediction models. But their effectiveness has not been exhaustively evaluated.

Objective:

To study the effect of including informative missingness features in Machine Learning models for various clinically relevant outcomes and explore robustness of these features across patient subgroups and task settings.

Methods:

A total of 48,336 electronic health records from the 2012 and 2019 Physionet Challenges were used and Mortality, Length-of-Stay (LOS) and Sepsis outcomes were chosen. The latter dataset was multi-center, allowing external validation. Gated Recurrent Units were used to learn sequential patterns in the data and classify or predict labels of interest. Models were evaluated on various criteria evaluating discriminative ability and calibration, as well as across population subgroups.

Results:

Generally improved model performance in retrospective tasks was observed on including missingness features. Extent of improvement depended on the outcome of interest (AUROC improved 1.2% - 7.7%) and even patient subgroup. However, missingness features did not display utility in a prospective setting, being outperformed (0.9% difference in AUROC) by the model relying only on pathological features. This was despite leading to earlier detection of disease (true positives), since including these features led to a concomitant rise in false positive detections.

Conclusions:

This study exhaustively evaluated effectiveness of missingness features on machine learning models. A detailed understanding of how these features affect model performance may lead to their informed use in clinical settings especially for administrative tasks like LOS prediction where they present the greatest benefit. While missingness features, representative of healthcare processes, vary greatly due to intra and inter-hospital factors, they may still be used in prediction models for clinically relevant outcomes. However, their use in prospective models producing frequent predictions needs to be explored further.


 Citation

Please cite as:

Singh J, Sato M, Ohkuma T

On Missingness Features in Machine Learning Models for Critical Care: Observational Study

JMIR Med Inform 2021;9(12):e25022

DOI: 10.2196/25022

PMID: 34889756

PMCID: 8701717

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.