Accepted for/Published in: JMIR Medical Informatics
Date Submitted: Jul 11, 2024
Open Peer Review Period: Jul 22, 2024 - Sep 16, 2024
Date Accepted: Nov 17, 2024
(closed for review but you can still tweet)
Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.
Identification of Patients with Congestive Heart Failure: Automated Electronic Health Records Phenotyping
ABSTRACT
Background:
Congestive heart failure (CHF) is a common cause of hospital admissions. Medical records contain valuable information about CHF, but manual chart review is time-consuming. Claims databases (ICD codes) provide a scalable alternative but are less accurate. Automated analysis of medical records through natural language processing (NLP) enables more efficient adjudication but has not yet been validated across multiple sites.
Objective:
We seek to accurately classify patients in whether they have CHF based on structured and unstructured data from each patient including medication, ICD codes, and information extracted through NLP of notes left by providers by comparing effectiveness of several machine learning models.
Methods:
We developed a NLP model to identify CHF from medical records using Electronic Health Record (EHR) from two hospitals (Mass General Hospital and Beth Israel Deaconess Medical Center from 2010 to 2023), using 2800 clinical visit notes from 1821 patients. We trained and compared the performance of logistic regression, random forest, and RoBERTa models. We measured model performance using areas under the receiver operating characteristic curve (AUROC) and precision-recall curve (AUPRC). These models were also externally validated by training the data on one hospital sample and testing on the other as well as an overall estimated error from a completely random sample from both hospitals was used.
Results:
Average age was 67.3 years old; 54.3% were female. The logistic regression model achieved the best performance using a combination of ICD codes, medications, and notes with an AUROC at 0.968 (0.940 – 0.982) and AUPRC at 0.921 (0.835 – 0.969). The models that only used ICD codes or medications had lower performance. The estimated overall error rate in a random EHR sample is 1.6%. The model also showed high external validitiy from training on MGH and testing on BIDMC (AUROC 0.927) and vise versa (AUROC 0.968).
Conclusions:
The proposed CHF EHR phenotyping model achieved excellent performance, external validity, and generalized across two institutions. The model enables multiple downstream uses, paving the way for large-scale studies of CHF treatment effectiveness, comorbidities, outcomes, and mechanisms.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.