JMIR Preprints #64113: Identification of Patients with Congestive Heart Failure: Automated Electronic Health Records Phenotyping

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

Identification of Patients with Congestive Heart Failure: Automated Electronic Health Records Phenotyping

Daniel Sumsion;
Elijah Davis;
Marta Fernandes;
Ruoqi Wei;
Rebecca Milde;
Jet Veltink;
Wan-Yee Kong;
Yiwen Xiong;
Samvrit Rao;
Tara M Westover;
Lydia Petersen;
Niels Turley;
Arjun Singh;
Stephanie S Buss;
Shibani Mukerji;
Sahar Zafar;
Sudeshna Das;
Valdery Moura Junior;
Manohar Ghanta;
Aditya Gupta;
Jennifer A Kim;
Katie L Stone;
Emmanuel Mignot;
Dennis Hwang;
Lynn Marie Trotti;
Gari D Clifford;
Umakanth Katwa;
M Brandon Westover;
Haoqi Sun

ABSTRACT

Background:

Congestive heart failure (CHF) is a common cause of hospital admissions. Medical records contain valuable information about CHF, but manual chart review is time-consuming. Claims databases (ICD codes) provide a scalable alternative but are less accurate. Automated analysis of medical records through natural language processing (NLP) enables more efficient adjudication but has not yet been validated across multiple sites.

Objective:

We seek to accurately classify patients in whether they have CHF based on structured and unstructured data from each patient including medication, ICD codes, and information extracted through NLP of notes left by providers by comparing effectiveness of several machine learning models.

Methods:

We developed a NLP model to identify CHF from medical records using Electronic Health Record (EHR) from two hospitals (Mass General Hospital and Beth Israel Deaconess Medical Center from 2010 to 2023), using 2800 clinical visit notes from 1821 patients. We trained and compared the performance of logistic regression, random forest, and RoBERTa models. We measured model performance using areas under the receiver operating characteristic curve (AUROC) and precision-recall curve (AUPRC). These models were also externally validated by training the data on one hospital sample and testing on the other as well as an overall estimated error from a completely random sample from both hospitals was used.

Results:

Average age was 67.3 years old; 54.3% were female. The logistic regression model achieved the best performance using a combination of ICD codes, medications, and notes with an AUROC at 0.968 (0.940 – 0.982) and AUPRC at 0.921 (0.835 – 0.969). The models that only used ICD codes or medications had lower performance. The estimated overall error rate in a random EHR sample is 1.6%. The model also showed high external validitiy from training on MGH and testing on BIDMC (AUROC 0.927) and vise versa (AUROC 0.968).

Conclusions:

The proposed CHF EHR phenotyping model achieved excellent performance, external validity, and generalized across two institutions. The model enables multiple downstream uses, paving the way for large-scale studies of CHF treatment effectiveness, comorbidities, outcomes, and mechanisms.

Citation

Please cite as:

Sumsion D, Davis E, Fernandes M, Wei R, Milde R, Veltink J, Kong WY, Xiong Y, Rao S, Westover TM, Petersen L, Turley N, Singh A, Buss SS, Mukerji S, Zafar S, Das S, Junior VM, Ghanta M, Gupta A, Kim JA, Stone KL, Mignot E, Hwang D, Trotti LM, Clifford GD, Katwa U, Westover MB, Sun H

Identification of Patients With Congestive Heart Failure From the Electronic Health Records of Two Hospitals: Retrospective Study

JMIR Med Inform 2025;13:e64113

DOI: 10.2196/64113

PMID: 40208662

PMCID: 12022513

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Accepted for/Published in: JMIR Medical Informatics

Date Submitted: Jul 11, 2024

Open Peer Review Period: Jul 22, 2024 - Sep 16, 2024

Date Accepted: Nov 17, 2024

(closed for review but you can still tweet)

Identification of Patients with Congestive Heart Failure: Automated Electronic Health Records Phenotyping

ABSTRACT

Citation

Copyright