JMIR Preprints #38557: Lifting Hospital EHR Data Treasures: Challenges and Opportunities

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

Lifting Hospital EHR Data Treasures: Challenges and Opportunities

Alexander Maletzky;
Carl Böck;
Thomas Tschoellitsch;
Theresa Roland;
Helga Ludwig;
Stefan Thumfart;
Michael Giretzlehner;
Sepp Hochreiter;
Jens Meier

ABSTRACT

Background:

Electronic health records have been successfully employed in data science and machine learning projects in the past. Most of these data are collected for clinical use rather than retrospective analysis, though. This means that researchers will typically face many different issues when trying to access and prepare the data for secondary use.

Objective:

The main goal of this paper is to create awareness that preparation of routinely acquired medical data remains a challenge despite an ever-growing set of software tools.

Methods:

We report our experience and findings from a large-scale data science project analyzing routinely acquired, retrospective data from the Kepler University Hospital in Linz, Austria. The project involves data from more than 150,000 patients collected over a period of ten years. The data preparation process includes exporting the data from the hospital's data warehouses, de-identifying the data, detecting and correcting errors and inconsistencies therein, transforming them into a format suitable for machine learning, and extracting clinically meaningful labels for supervised learning.

Results:

Raw electronic health record data can be corrupted in many unexpected ways that demand thorough manual inspection and highly individualized data cleaning solutions. Specific problems encountered include: variable names or codes that differ between wards or change over time; matching data distributed across several disparate data sources; artifacts in waveform signals and challenges related to the way they are internally represented; extracting surrogate labels for supervised learning from retrospective data that lack explicit label information.

Conclusions:

Only few of the data preparation issues encountered in our project are addressed by generic medical data preprocessing tools that have been proposed recently. We propose a ‘checklist’ for guiding practitioners through retrospective medical data science projects and help them avoid the most common pitfalls. This checklist may also offer valuable insights for setting up prospective data acquisition strategies for subsequent data analysis projects.

Citation

Please cite as:

Maletzky A, Böck C, Tschoellitsch T, Roland T, Ludwig H, Thumfart S, Giretzlehner M, Hochreiter S, Meier J

Lifting Hospital Electronic Health Record Data Treasures: Challenges and Opportunities

JMIR Med Inform 2022;10(10):e38557

DOI: 10.2196/38557

PMID: 36269654

PMCID: 9636533

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Accepted for/Published in: JMIR Medical Informatics

Date Submitted: Apr 7, 2022

Open Peer Review Period: Apr 7, 2022 - Jun 2, 2022

Date Accepted: Sep 7, 2022

(closed for review but you can still tweet)

Lifting Hospital EHR Data Treasures: Challenges and Opportunities

ABSTRACT

Citation

Copyright