JMIR Preprints #47862: Improving the EHR-based Clinical Prediction Model Under Label Deficiency: A Network-based Generative Adversarial Semisupervised Approach

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

Improving the EHR-based Clinical Prediction Model Under Label Deficiency: A Network-based Generative Adversarial Semisupervised Approach

Runze Li;
Yu Tian;
Zhuyi Shen;
Jin Li;
Jun Li;
Kefeng Ding;
Jingsong Li

ABSTRACT

Background:

Observational biomedical studies facilitate a new strategy for large-scale electronic health record (EHR) utilization to support precision medicine. However, the data label inaccessibility is an increasingly important issue in clinical prediction despite employing synthetic and semisupervised learning from data. Little research has aimed to uncover the underlying graphical structure of EHRs.

Objective:

A network-based generative adversarial semisupervised method is proposed. The objective is to train clinical prediction models on label-deficient EHRs to achieve comparable learning performance to supervised methods.

Methods:

Three public datasets and one colorectal cancer dataset gathered from the Second Affiliated Hospital of Zhejiang University are selected as benchmarks. The proposed models are trained on 5% to 25% labeled data and evaluated on classification metrics against conventional semisupervised and supervised methods. The data quality, model security, and memory scalability are also evaluated.

Results:

The proposed method for semisupervised classification outperforms related semisupervised methods under the same setup, with the average AUCs reaching 0.945, 0.673, 0.611, and 0.588, followed by graph-based semisupervised learning (0.450, 0.454, 0.425, 0.5676) and label propagation (0.475,0.344, 0.440, 0.477). The average classification AUCs with 10% labeled data are 0.929, 0.719, 0.652, and 0.650, comparable to that of the supervised learning methods logistic regression (0.601, 0.670, 0.731, 0.710), support vector machines (0.733, 0.720, 0.720, 0.721), and random forests (0.982, 0.750, 0.758, 0.740). The concerns regarding the secondary use of data and data security are alleviated by realistic data synthesis and robust privacy preservation.

Conclusions:

Training clinical prediction models on label-deficient EHRs is indispensable in data-driven research. The proposed method has great potential to exploit the intrinsic structure of EHRs and achieve comparable learning performance to supervised methods.

Citation

Please cite as:

Li R, Tian Y, Shen Z, Li J, Li J, Ding K, Li J

Improving an Electronic Health Record–Based Clinical Prediction Model Under Label Deficiency: Network-Based Generative Adversarial Semisupervised Approach

JMIR Med Inform 2023;11:e47862

DOI: 10.2196/47862

PMID: 37310778

PMCID: 10337516

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Accepted for/Published in: JMIR Medical Informatics

Date Submitted: Apr 4, 2023

Date Accepted: May 12, 2023

Improving the EHR-based Clinical Prediction Model Under Label Deficiency: A Network-based Generative Adversarial Semisupervised Approach

ABSTRACT

Citation

Copyright