JMIR Preprints #68830: Autoencoder-Based Representation Learning for Similar Patients Retrieval from Electronic Health Records: A Comparative Study

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

Autoencoder-Based Representation Learning for Similar Patients Retrieval from Electronic Health Records: A Comparative Study

Mei Liu;
Deyi Li;
Aditi Shukla;
Sravani Chandaka;
Bradley Taylor;
Jie Xu

ABSTRACT

Background:

By analyzing electronic health record (EHR) snapshots of similar patients, physicians can proactively predict disease onset, customize treatment plans, and anticipate patient-specific trajectories. However, EHR data are inherently challenging to model due to high dimensionality, mixed feature types, noise, bias, and sparsity. Patient representation learning with autoencoders (AE) presents promising opportunities. A critical question remains: how do different AE designs and distance measures impact the quality of retrieved similar patient cohorts?

Objective:

This study aims to evaluate the performance of five common AE variants—vanilla AE (AE), denoising AE (DAE), contractive AE (CAE), sparse AE (SAE), and robust AE (RAE)—in retrieving similar patients. Additionally, it investigates the impact of different distance measures and hyperparameter configurations on model performance.

Methods:

We tested the five AE variants on two real-world datasets from the University of Kansas Medical Center (KUMC, n = 13,752) and the Medical College of Wisconsin (MCW, n = 9,568) across 168 different hyperparameter configurations. Euclidean distance-based k-nearest neighbors (k-NN) and Mahalanobis distance-based k-NN were then applied on the latent representations to retrieve similar patients. Two prediction targets were evaluated: Acute Kidney Injury (AKI) onset and post-discharge 1-year mortality, with F1 score as the evaluation metric.

Results:

Our findings demonstrate that (1) DAE outperformed other AE variants when paired with Euclidean distance (P<.001), followed by vanilla AE and CAE; (2) learning rates significantly influenced the performance of AE variants; and (3) Mahalanobis distance-based k-NN frequently outperformed Euclidean distance-based k-NN when applied to latent representations. However, whether AE models yield better performance by transforming raw data into latent representations, compared to applying Mahalanobis distance-based k-NN directly to raw data, was observed to be data-dependent.

Conclusions:

This study provides a comprehensive analysis of the performance of different AE variants in retrieving similar patients and evaluates the impact of various hyperparameter configurations on model performance. It lays the groundwork for future development of AE-based patient similarity estimation and personalized medicine models.

Citation

Please cite as:

Liu M, Li D, Shukla A, Chandaka S, Taylor B, Xu J

Autoencoder-Based Representation Learning for Similar Patients Retrieval From Electronic Health Records: Comparative Study

JMIR Med Inform 2025;13:e68830

DOI: 10.2196/68830

PMID: 40706557

PMCID: 12289314

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Accepted for/Published in: JMIR Medical Informatics

Date Submitted: Nov 15, 2024

Date Accepted: May 4, 2025

Autoencoder-Based Representation Learning for Similar Patients Retrieval from Electronic Health Records: A Comparative Study

ABSTRACT

Citation

Copyright