Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR Medical Informatics

Date Submitted: Apr 4, 2023
Date Accepted: May 12, 2023

The final, peer-reviewed published version of this preprint can be found here:

Improving an Electronic Health Record–Based Clinical Prediction Model Under Label Deficiency: Network-Based Generative Adversarial Semisupervised Approach

Li R, Tian Y, Shen Z, Li J, Li J, Ding K, Li J

Improving an Electronic Health Record–Based Clinical Prediction Model Under Label Deficiency: Network-Based Generative Adversarial Semisupervised Approach

JMIR Med Inform 2023;11:e47862

DOI: 10.2196/47862

PMID: 37310778

PMCID: 10337516

Improving the EHR-based Clinical Prediction Model Under Label Deficiency: A Network-based Generative Adversarial Semisupervised Approach

  • Runze Li; 
  • Yu Tian; 
  • Zhuyi Shen; 
  • Jin Li; 
  • Jun Li; 
  • Kefeng Ding; 
  • Jingsong Li

ABSTRACT

Background:

Observational biomedical studies facilitate a new strategy for large-scale electronic health record (EHR) utilization to support precision medicine. However, the data label inaccessibility is an increasingly important issue in clinical prediction despite employing synthetic and semisupervised learning from data. Little research has aimed to uncover the underlying graphical structure of EHRs.

Objective:

A network-based generative adversarial semisupervised method is proposed. The objective is to train clinical prediction models on label-deficient EHRs to achieve comparable learning performance to supervised methods.

Methods:

Three public datasets and one colorectal cancer dataset gathered from the Second Affiliated Hospital of Zhejiang University are selected as benchmarks. The proposed models are trained on 5% to 25% labeled data and evaluated on classification metrics against conventional semisupervised and supervised methods. The data quality, model security, and memory scalability are also evaluated.

Results:

The proposed method for semisupervised classification outperforms related semisupervised methods under the same setup, with the average AUCs reaching 0.945, 0.673, 0.611, and 0.588, followed by graph-based semisupervised learning (0.450, 0.454, 0.425, 0.5676) and label propagation (0.475,0.344, 0.440, 0.477). The average classification AUCs with 10% labeled data are 0.929, 0.719, 0.652, and 0.650, comparable to that of the supervised learning methods logistic regression (0.601, 0.670, 0.731, 0.710), support vector machines (0.733, 0.720, 0.720, 0.721), and random forests (0.982, 0.750, 0.758, 0.740). The concerns regarding the secondary use of data and data security are alleviated by realistic data synthesis and robust privacy preservation.

Conclusions:

Training clinical prediction models on label-deficient EHRs is indispensable in data-driven research. The proposed method has great potential to exploit the intrinsic structure of EHRs and achieve comparable learning performance to supervised methods.


 Citation

Please cite as:

Li R, Tian Y, Shen Z, Li J, Li J, Ding K, Li J

Improving an Electronic Health Record–Based Clinical Prediction Model Under Label Deficiency: Network-Based Generative Adversarial Semisupervised Approach

JMIR Med Inform 2023;11:e47862

DOI: 10.2196/47862

PMID: 37310778

PMCID: 10337516

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.