JMIR Preprints #8960: Characterizing and Managing Missing Structured Data in Electronic Health Records: Data Analysis

Current Preprint Settings

(as selected by the authors)

1. Allow access to the preprint PDF upon submission to:

(a) Open peer-review purposes
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) Nobody

2. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) Nobody

3. When a final paper is published in a JMIR journal, display the preprint as follows:

(a) Allow download
(b) Show abstract only
(c) Do not display anything

4. If the paper is rejected from JMIR journals, display the preprint to:

(a) Logged-in users only
(b) Anybody, anytime
(c) Nobody

Characterizing and Managing Missing Structured Data in Electronic Health Records: Data Analysis

Brett K Beaulieu-Jones;
Daniel R Lavage;
John W Snyder;
Jason H Moore;
Sarah A Pendergrass;
Christopher R Bauer

Background:

Missing data is a challenge for all studies; however, this is especially true for electronic health record (EHR)-based analyses. Failure to appropriately consider missing data can lead to biased results. While there has been extensive theoretical work on imputation, and many sophisticated methods are now available, it remains quite challenging for researchers to implement these methods appropriately. Here, we provide detailed procedures for when and how to conduct imputation of EHR laboratory results.

Objective:

The objective of this study was to demonstrate how the mechanism of missingness can be assessed, evaluate the performance of a variety of imputation methods, and describe some of the most frequent problems that can be encountered.

Methods:

We analyzed clinical laboratory measures from 602,366 patients in the EHR of Geisinger Health System in Pennsylvania, USA. Using these data, we constructed a representative set of complete cases and assessed the performance of 12 different imputation methods for missing data that was simulated based on 4 mechanisms of missingness (missing completely at random, missing not at random, missing at random, and real data modelling).

Results:

Our results showed that several methods, including variations of Multivariate Imputation by Chained Equations (MICE) and softImpute, consistently imputed missing values with low error; however, only a subset of the MICE methods was suitable for multiple imputation.

Conclusions:

The analyses we describe provide an outline of considerations for dealing with missing EHR data, steps that researchers can perform to characterize missingness within their own data, and an evaluation of methods that can be applied to impute clinical data. While the performance of methods may vary between datasets, the process we describe can be generalized to the majority of structured data types that exist in EHRs, and all of our methods and code are publicly available.

Citation

Please cite as:

Beaulieu-Jones BK, Lavage DR, Snyder JW, Moore JH, Pendergrass SA, Bauer CR

Characterizing and Managing Missing Structured Data in Electronic Health Records: Data Analysis

JMIR Med Inform 2018;6(1):e11

DOI: 10.2196/medinform.8960

PMID: 29475824

PMCID: 5845101

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Accepted for/Published in: JMIR Medical Informatics

Date Submitted: Sep 14, 2017

Open Peer Review Period: Jan 4, 2018 - Nov 16, 2017

Date Accepted: Dec 24, 2017

(closed for review but you can still tweet)

Characterizing and Managing Missing Structured Data in Electronic Health Records: Data Analysis

Citation

Copyright

JMIR Preprints

Accepted for/Published in: JMIR Medical Informatics

Date Submitted: Sep 14, 2017

Open Peer Review Period: Jan 4, 2018 - Nov 16, 2017

Date Accepted: Dec 24, 2017

(closed for review but you can still tweet)

Characterizing and Managing Missing Structured Data in Electronic Health Records: Data Analysis

Citation

Per the author's request the PDF is not available.

Copyright