JMIR Preprints #73884: A Process for Quality Management of EMR-Based Data: A Case Study Using Real Colorectal Cancer Data

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

A Process for Quality Management of EMR-Based Data: A Case Study Using Real Colorectal Cancer Data

NaYoung Park;
Kyungmin Na;
Woongsang Sunwoo;
Jeong-Heum Baek;
Youngho Lee;
Suehyun Lee;
HyeKyung Woo

ABSTRACT

Background:

As data-driven medical research advances, vast amounts of medical data are being collected, giving researchers access to important information. However, issues such as heterogeneity, complexity, and incompleteness of datasets limit their practical use. Errors and missing data negatively affect artificial intelligence (AI)-based predictive models, undermining the reliability of clinical decision-making. Thus, it is important to develop a quality management process (QMP) for clinical data.

Objective:

We aimed to develop a rules-based QMP to address errors and impute missing values in real-world data (RWD), establishing high-quality data for clinical research.

Methods:

We utilized clinical data from 6,491 colorectal cancer (CRC) patients collected at Gachon University Gil Medical Center between 2010 and 2022, leveraging the clinical library established within the Korea Clinical Data Use Network for Research Excellence (K-CURE). First, we conducted a literature review on the prognostic prediction of CRC to assess whether the data met our research purposes, comparing selected variables with RWD. Then a labeling process was implemented to extract key variables, which facilitated the creation of an automatic staging library. This library, combined with a rule-based process, allowed for systematic analysis and evaluation.

Results:

Theoretically, the tumor, node, metastasis (TNM) stage was identified as an important prognostic factor for CRC but it was not selected through feature selection in RWD. After applying the QMP, rates of missing data were reduced from 75.26% to 35.73% for TNM and from 24.28% to 18.46% for Surveillance, Epidemiology, and End Results (SEER), confirming the system’s effectiveness. Variable importance analysis through feature selection revealed that TNM stage and detailed code variables, which were previously unselected, were included in the improved model.

Conclusions:

In sum, we developed a rules-based QMP to address errors and impute missing values in K-CURE data, enhancing data quality. The applicability of the process to real-world datasets highlights its potential for broader use in clinical studies and cancer research.

Citation

Please cite as:

Park N, Na K, Sunwoo W, Baek JH, Lee Y, Lee S, Woo H

Process for Quality Management of Electronic Medical Records–Based Data: Case Study Using Real Colorectal Cancer Data

JMIR Med Inform 2025;13:e73884

DOI: 10.2196/73884

PMID: 41232039

PMCID: 12614659

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Accepted for/Published in: JMIR Medical Informatics

Date Submitted: Mar 13, 2025

Open Peer Review Period: Mar 13, 2025 - May 8, 2025

Date Accepted: Jul 6, 2025

(closed for review but you can still tweet)

A Process for Quality Management of EMR-Based Data: A Case Study Using Real Colorectal Cancer Data

ABSTRACT

Citation

Copyright

JMIR Preprints

Accepted for/Published in: JMIR Medical Informatics

Date Submitted: Mar 13, 2025

Open Peer Review Period: Mar 13, 2025 - May 8, 2025

Date Accepted: Jul 6, 2025

(closed for review but you can still tweet)

A Process for Quality Management of EMR-Based Data: A Case Study Using Real Colorectal Cancer Data

ABSTRACT

Citation

The author of this paper has made a PDF available, but requires the user to login, or create an account.

Copyright