JMIR Preprints #25022: On Missingness Features in Machine Learning Models for Critical Care: Observational Study

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

On Missingness Features in Machine Learning Models for Critical Care: Observational Study

Janmajay Singh;
Masahiro Sato;
Tomoko Ohkuma

ABSTRACT

Background:

Missing data in Electronic Health Records is inevitable and considered to be non-random. Several studies have found that features indicating missing patterns (missingness) encode useful information about a patient's health and advocate their inclusion in clinical prediction models. But their effectiveness has not been exhaustively evaluated.

Objective:

To study the effect of including informative missingness features in Machine Learning models for various clinically relevant outcomes and explore robustness of these features across patient subgroups and task settings.

Methods:

A total of 48,336 electronic health records from the 2012 and 2019 Physionet Challenges were used and Mortality, Length-of-Stay (LOS) and Sepsis outcomes were chosen. The latter dataset was multi-center, allowing external validation. Gated Recurrent Units were used to learn sequential patterns in the data and classify or predict labels of interest. Models were evaluated on various criteria evaluating discriminative ability and calibration, as well as across population subgroups.

Results:

Generally improved model performance in retrospective tasks was observed on including missingness features. Extent of improvement depended on the outcome of interest (AUROC improved 1.2% - 7.7%) and even patient subgroup. However, missingness features did not display utility in a prospective setting, being outperformed (0.9% difference in AUROC) by the model relying only on pathological features. This was despite leading to earlier detection of disease (true positives), since including these features led to a concomitant rise in false positive detections.

Conclusions:

This study exhaustively evaluated effectiveness of missingness features on machine learning models. A detailed understanding of how these features affect model performance may lead to their informed use in clinical settings especially for administrative tasks like LOS prediction where they present the greatest benefit. While missingness features, representative of healthcare processes, vary greatly due to intra and inter-hospital factors, they may still be used in prediction models for clinically relevant outcomes. However, their use in prospective models producing frequent predictions needs to be explored further.

Citation

Please cite as:

Singh J, Sato M, Ohkuma T

On Missingness Features in Machine Learning Models for Critical Care: Observational Study

JMIR Med Inform 2021;9(12):e25022

DOI: 10.2196/25022

PMID: 34889756

PMCID: 8701717

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Accepted for/Published in: JMIR Medical Informatics

Date Submitted: Oct 14, 2020

Date Accepted: Sep 2, 2021

On Missingness Features in Machine Learning Models for Critical Care: Observational Study

ABSTRACT

Citation

Copyright