Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR Medical Informatics

Date Submitted: Mar 13, 2025
Open Peer Review Period: Mar 13, 2025 - May 8, 2025
Date Accepted: Jul 6, 2025
(closed for review but you can still tweet)

The final, peer-reviewed published version of this preprint can be found here:

Process for Quality Management of Electronic Medical Records–Based Data: Case Study Using Real Colorectal Cancer Data

Park N, Na K, Sunwoo W, Baek JH, Lee Y, Lee S, Woo H

Process for Quality Management of Electronic Medical Records–Based Data: Case Study Using Real Colorectal Cancer Data

JMIR Med Inform 2025;13:e73884

DOI: 10.2196/73884

PMID: 41232039

PMCID: 12614659

A Process for Quality Management of EMR-Based Data: A Case Study Using Real Colorectal Cancer Data

  • NaYoung Park; 
  • Kyungmin Na; 
  • Woongsang Sunwoo; 
  • Jeong-Heum Baek; 
  • Youngho Lee; 
  • Suehyun Lee; 
  • HyeKyung Woo

ABSTRACT

Background:

As data-driven medical research advances, vast amounts of medical data are being collected, giving researchers access to important information. However, issues such as heterogeneity, complexity, and incompleteness of datasets limit their practical use. Errors and missing data negatively affect artificial intelligence (AI)-based predictive models, undermining the reliability of clinical decision-making. Thus, it is important to develop a quality management process (QMP) for clinical data.

Objective:

We aimed to develop a rules-based QMP to address errors and impute missing values in real-world data (RWD), establishing high-quality data for clinical research.

Methods:

We utilized clinical data from 6,491 colorectal cancer (CRC) patients collected at Gachon University Gil Medical Center between 2010 and 2022, leveraging the clinical library established within the Korea Clinical Data Use Network for Research Excellence (K-CURE). First, we conducted a literature review on the prognostic prediction of CRC to assess whether the data met our research purposes, comparing selected variables with RWD. Then a labeling process was implemented to extract key variables, which facilitated the creation of an automatic staging library. This library, combined with a rule-based process, allowed for systematic analysis and evaluation.

Results:

Theoretically, the tumor, node, metastasis (TNM) stage was identified as an important prognostic factor for CRC but it was not selected through feature selection in RWD. After applying the QMP, rates of missing data were reduced from 75.26% to 35.73% for TNM and from 24.28% to 18.46% for Surveillance, Epidemiology, and End Results (SEER), confirming the system’s effectiveness. Variable importance analysis through feature selection revealed that TNM stage and detailed code variables, which were previously unselected, were included in the improved model.

Conclusions:

In sum, we developed a rules-based QMP to address errors and impute missing values in K-CURE data, enhancing data quality. The applicability of the process to real-world datasets highlights its potential for broader use in clinical studies and cancer research.


 Citation

Please cite as:

Park N, Na K, Sunwoo W, Baek JH, Lee Y, Lee S, Woo H

Process for Quality Management of Electronic Medical Records–Based Data: Case Study Using Real Colorectal Cancer Data

JMIR Med Inform 2025;13:e73884

DOI: 10.2196/73884

PMID: 41232039

PMCID: 12614659

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.