JMIR Preprints #70140: Deep phenotyping obesity using EHR data: Promise, Challenges, and Future Directions

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

Deep phenotyping obesity using EHR data: Promise, Challenges, and Future Directions

Xiaoyang Ruan;
Shuyu Lu;
Liwei Wang;
Andrew Wen;
Sameer Murali;
Hongfang Liu

ABSTRACT

Background:

Obesity affects approximately 40% of adults and 15–20% of children and adolescents in the U.S, and poses significant economic and psychosocial burdens. Currently, patient responses to any single anti-obesity medication (AOM) vary significantly, making obesity deep phenotyping and associated precision medicine important targets of investigation.

Objective:

To evaluate the potential of EHR as a primary data source for obesity deep phenotyping, we conduct an in-depth analysis of the data elements and quality available from obesity patients prior to pharmacotherapy, and apply a multi-modal longitudinal deep autoencoder to investigate the feasibility, data requirements, clustering patterns, and challenges associated with EHR-based obesity deep phenotyping.

Methods:

We analyzed 53,688 pre-AOM periods from 32,969 patients with obesity or overweight who underwent medium- to long-term AOM treatment. A total of 92 lab and vital measurements, along with 79 ICD-derived clinical classifications software (CCS) codes recorded within one year prior to AOM treatment, were used to train a gated recurrent unit with decay based longitudinal autoencoder (GRU-D-AE) to generate dense embeddings for each pre-AOM record. principal component analysis (PCA) and gaussian mixture modeling (GMM) were applied to identify clusters.

Results:

Our analysis identified at least nine clusters, with five exhibiting distinct and explainable clinical relevance. Certain clusters show characteristics overlapping with phenotypes from traditional phenotyping strategy. Results from multiple training folds demonstrated stable clustering patterns in two-dimensional space and reproducible clinical significance. However, challenges persist regarding the stability of missing data imputation across folds, maintaining consistency in input features, and effectively visualizing complex diseases in low-dimensional spaces

Conclusions:

In this proof-of-concept study, we demonstrated longitudinal EHR as a valuable resource for deep phenotyping the pre-AOM period at per patient visit level. Our analysis revealed the presence of clusters with distinct clinical significance, which could have implications in AOM treatment options. Further research using larger, independent cohorts is necessary to validate the reproducibility and clinical relevance of these clusters, uncover more detailed substructures and corresponding AOM treatment responses.

Citation

Please cite as:

Ruan X, Lu S, Wang L, Wen A, Murali S, Liu H

Deep Phenotyping of Obesity: Electronic Health Record–Based Temporal Modeling Study

J Med Internet Res 2025;27:e70140

DOI: 10.2196/70140

PMID: 40834423

PMCID: 12373304

JMIR Publications

JMIR Preprints

Accepted for/Published in: Journal of Medical Internet Research

Date Submitted: Dec 16, 2024

Date Accepted: May 8, 2025

(closed for review but you can still tweet)

Deep phenotyping obesity using EHR data: Promise, Challenges, and Future Directions

ABSTRACT

Citation

JMIR Preprints

Accepted for/Published in: Journal of Medical Internet Research

Date Submitted: Dec 16, 2024

Date Accepted: May 8, 2025

(closed for review but you can still tweet)

Deep phenotyping obesity using EHR data: Promise, Challenges, and Future Directions

ABSTRACT

Citation

Per the author's request the PDF is not available.