Accepted for/Published in: Journal of Medical Internet Research
Date Submitted: Apr 23, 2025
Date Accepted: May 26, 2025
Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.
Multi-Criteria Optimization of Language Models for HFpEF Symptom Detection in Spanish Electronic Health Records
ABSTRACT
Background:
Heart failure with preserved ejection fraction (HFpEF) is a major clinical manifestation of cardiac amyloidosis (CA), a condition frequently underdiagnosed due to its nonspecific symptomatology. Electronic health records (EHRs) offer a promising avenue for supporting early symptom detection through natural language processing (NLP). However, identifying relevant clinical cues within unstructured narratives, particularly in Spanish, remains a significant challenge owing to the scarcity of annotated corpora and domain-specific models.
Objective:
This study proposes and evaluates a Transformer-based NLP framework for the automated detection of HFpEF-related symptoms in Spanish EHRs. Our objective is to assess the feasibility of leveraging clinical narratives to support early identification of heart failure phenotypes suggestive of cardiac amyloidosis.
Methods:
A novel corpus was developed from over 15,000 Spanish clinical documents, manually annotated and validated by cardiology experts. Several Transformer architectures were benchmarked, including general-purpose, biomedical-specialized, and long-document models (Longformers), under three optimization strategies tailored to clinically relevant metrics: area under the curve (AUC), F1-score, and sensitivity.
Results:
All models achieved strong performance (AUC > 0.940). The best-performing model, Longformer Biomedical-Clinical, reached an AUC of 0.987 and F1-score of 0.985. Sensitivity-optimized models achieved false negative rates below 3%, a critical threshold for clinical applicability.
Conclusions:
Transformer-based models can robustly detect HFpEF-related symptoms from unstructured Spanish clinical texts, even in the presence of class imbalance and complex language patterns. Our findings highlight the importance of combining domain-specific pretraining, long-context modeling, and tailored optimization to enhance the performance of NLP systems in high-impact clinical applications.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.