Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR Medical Informatics

Date Submitted: Apr 7, 2022
Open Peer Review Period: Apr 7, 2022 - Jun 2, 2022
Date Accepted: Sep 7, 2022
(closed for review but you can still tweet)

The final, peer-reviewed published version of this preprint can be found here:

Lifting Hospital Electronic Health Record Data Treasures: Challenges and Opportunities

Maletzky A, Böck C, Tschoellitsch T, Roland T, Ludwig H, Thumfart S, Giretzlehner M, Hochreiter S, Meier J

Lifting Hospital Electronic Health Record Data Treasures: Challenges and Opportunities

JMIR Med Inform 2022;10(10):e38557

DOI: 10.2196/38557

PMID: 36269654

PMCID: 9636533

Lifting Hospital EHR Data Treasures: Challenges and Opportunities

  • Alexander Maletzky; 
  • Carl Böck; 
  • Thomas Tschoellitsch; 
  • Theresa Roland; 
  • Helga Ludwig; 
  • Stefan Thumfart; 
  • Michael Giretzlehner; 
  • Sepp Hochreiter; 
  • Jens Meier

ABSTRACT

Background:

Electronic health records have been successfully employed in data science and machine learning projects in the past. Most of these data are collected for clinical use rather than retrospective analysis, though. This means that researchers will typically face many different issues when trying to access and prepare the data for secondary use.

Objective:

The main goal of this paper is to create awareness that preparation of routinely acquired medical data remains a challenge despite an ever-growing set of software tools.

Methods:

We report our experience and findings from a large-scale data science project analyzing routinely acquired, retrospective data from the Kepler University Hospital in Linz, Austria. The project involves data from more than 150,000 patients collected over a period of ten years. The data preparation process includes exporting the data from the hospital's data warehouses, de-identifying the data, detecting and correcting errors and inconsistencies therein, transforming them into a format suitable for machine learning, and extracting clinically meaningful labels for supervised learning.

Results:

Raw electronic health record data can be corrupted in many unexpected ways that demand thorough manual inspection and highly individualized data cleaning solutions. Specific problems encountered include: variable names or codes that differ between wards or change over time; matching data distributed across several disparate data sources; artifacts in waveform signals and challenges related to the way they are internally represented; extracting surrogate labels for supervised learning from retrospective data that lack explicit label information.

Conclusions:

Only few of the data preparation issues encountered in our project are addressed by generic medical data preprocessing tools that have been proposed recently. We propose a ‘checklist’ for guiding practitioners through retrospective medical data science projects and help them avoid the most common pitfalls. This checklist may also offer valuable insights for setting up prospective data acquisition strategies for subsequent data analysis projects.


 Citation

Please cite as:

Maletzky A, Böck C, Tschoellitsch T, Roland T, Ludwig H, Thumfart S, Giretzlehner M, Hochreiter S, Meier J

Lifting Hospital Electronic Health Record Data Treasures: Challenges and Opportunities

JMIR Med Inform 2022;10(10):e38557

DOI: 10.2196/38557

PMID: 36269654

PMCID: 9636533

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.