Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.
Who will be affected?
Readers: No access to all 28 journals. We recommend accessing our articles via PubMed Central
Authors: No access to the submission form or your user account.
Reviewers: No access to your user account. Please download manuscripts you are reviewing for offline reading before Wednesday, July 01, 2020 at 7:00 PM.
Editors: No access to your user account to assign reviewers or make decisions.
Copyeditors: No access to user account. Please download manuscripts you are copyediting before Wednesday, July 01, 2020 at 7:00 PM.
West M, Cheng Y, He Y, Leng Y, Magdamo C, Hyman BT, Dickson JR, Serrano-Pozo A, Blacker D, Das S
Unsupervised Deep Learning of Electronic Health Records to Characterize Heterogeneity Across Alzheimer Disease and Related Dementias: Cross-Sectional Study
Unsupervised Deep Learning of Electronic Health Records Characterizes Heterogeneity Across Alzheimer’s Disease and Related Dementias
Matthew West;
You Cheng;
Yingnan He;
Yu Leng;
Colin Magdamo;
Bradley T. Hyman;
John R. Dickson;
Alberto Serrano-Pozo;
Deborah Blacker;
Sudeshna Das
ABSTRACT
Background:
Alzheimer's disease and related dementias (ADRD) exhibit prominent heterogeneity. Identifying clinically meaningful ADRD subtypes is essential for tailoring treatments to specific patient phenotypes.
Objective:
To employ unsupervised learning techniques on electronic health records (EHRs) from memory clinic patients to identify ADRD subtypes.
Methods:
We used pre-trained embeddings of non-ADRD diagnosis codes (ICD) and large language model (LLM)-derived embeddings of clinical notes from patient EHRs. Hierarchical clustering of these embeddings was used to identify ADRD subtypes. Clusters were characterized in terms of their demographic and clinical features.
Results:
We analyzed a cohort of 3,454 ADRD memory clinic patients at Massachusetts General Hospital, each with a specialist diagnosis. Clustering pre-trained embeddings of the non-ADRD diagnosis codes in patient EHRs revealed three patient subtypes: one with skin conditions, another with psychiatric disorders and an earlier ages of onset, and a third with diabetes complications. Similarly, using large language model (LLM)-derived embeddings of clinical notes, we identified three subtypes of patients: one with psychiatric manifestations and higher prevalence of females (prevalence ratio: 1.59), another with cardiovascular and motor problems and higher prevalence of males (prevalence ratio: 1.75), and a third one with geriatric health disorders. Notably, we observed significant overlap between clusters from both data modalities.
Conclusions:
By integrating ICD codes and LLM-derived embeddings, our analysis delineated two distinct ADRD subtypes with sex-specific comorbid and clinical presentations, offering insights for potential precision medicine approaches.
Citation
Please cite as:
West M, Cheng Y, He Y, Leng Y, Magdamo C, Hyman BT, Dickson JR, Serrano-Pozo A, Blacker D, Das S
Unsupervised Deep Learning of Electronic Health Records to Characterize Heterogeneity Across Alzheimer Disease and Related Dementias: Cross-Sectional Study