JMIR Preprints #65178: Unsupervised Deep Learning of Electronic Health Records Characterizes Heterogeneity Across Alzheimer’s Disease and Related Dementias

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

Unsupervised Deep Learning of Electronic Health Records Characterizes Heterogeneity Across Alzheimer’s Disease and Related Dementias

Matthew West;
You Cheng;
Yingnan He;
Yu Leng;
Colin Magdamo;
Bradley T. Hyman;
John R. Dickson;
Alberto Serrano-Pozo;
Deborah Blacker;
Sudeshna Das

ABSTRACT

Background:

Alzheimer's disease and related dementias (ADRD) exhibit prominent heterogeneity. Identifying clinically meaningful ADRD subtypes is essential for tailoring treatments to specific patient phenotypes.

Objective:

To employ unsupervised learning techniques on electronic health records (EHRs) from memory clinic patients to identify ADRD subtypes.

Methods:

We used pre-trained embeddings of non-ADRD diagnosis codes (ICD) and large language model (LLM)-derived embeddings of clinical notes from patient EHRs. Hierarchical clustering of these embeddings was used to identify ADRD subtypes. Clusters were characterized in terms of their demographic and clinical features.

Results:

We analyzed a cohort of 3,454 ADRD memory clinic patients at Massachusetts General Hospital, each with a specialist diagnosis. Clustering pre-trained embeddings of the non-ADRD diagnosis codes in patient EHRs revealed three patient subtypes: one with skin conditions, another with psychiatric disorders and an earlier ages of onset, and a third with diabetes complications. Similarly, using large language model (LLM)-derived embeddings of clinical notes, we identified three subtypes of patients: one with psychiatric manifestations and higher prevalence of females (prevalence ratio: 1.59), another with cardiovascular and motor problems and higher prevalence of males (prevalence ratio: 1.75), and a third one with geriatric health disorders. Notably, we observed significant overlap between clusters from both data modalities.

Conclusions:

By integrating ICD codes and LLM-derived embeddings, our analysis delineated two distinct ADRD subtypes with sex-specific comorbid and clinical presentations, offering insights for potential precision medicine approaches.

Citation

Please cite as:

West M, Cheng Y, He Y, Leng Y, Magdamo C, Hyman BT, Dickson JR, Serrano-Pozo A, Blacker D, Das S

Unsupervised Deep Learning of Electronic Health Records to Characterize Heterogeneity Across Alzheimer Disease and Related Dementias: Cross-Sectional Study

JMIR Aging 2025;8:e65178

DOI: 10.2196/65178

PMID: 40163031

PMCID: 11997524

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Accepted for/Published in: JMIR Aging

Date Submitted: Aug 7, 2024

Date Accepted: Mar 10, 2025

Unsupervised Deep Learning of Electronic Health Records Characterizes Heterogeneity Across Alzheimer’s Disease and Related Dementias

ABSTRACT

Citation

Copyright