JMIR Preprints #76433: Multi-Criteria Optimization of Language Models for HFpEF Symptom Detection in Spanish Electronic Health Records

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

Multi-Criteria Optimization of Language Models for HFpEF Symptom Detection in Spanish Electronic Health Records

Jacinto Mata;
Victoria Pachón;
Ana Manovel;
Manuel J. Maña;
Manuel de la Villa

ABSTRACT

Background:

Heart failure with preserved ejection fraction (HFpEF) is a major clinical manifestation of cardiac amyloidosis (CA), a condition frequently underdiagnosed due to its nonspecific symptomatology. Electronic health records (EHRs) offer a promising avenue for supporting early symptom detection through natural language processing (NLP). However, identifying relevant clinical cues within unstructured narratives, particularly in Spanish, remains a significant challenge owing to the scarcity of annotated corpora and domain-specific models.

Objective:

This study proposes and evaluates a Transformer-based NLP framework for the automated detection of HFpEF-related symptoms in Spanish EHRs. Our objective is to assess the feasibility of leveraging clinical narratives to support early identification of heart failure phenotypes suggestive of cardiac amyloidosis.

Methods:

A novel corpus was developed from over 15,000 Spanish clinical documents, manually annotated and validated by cardiology experts. Several Transformer architectures were benchmarked, including general-purpose, biomedical-specialized, and long-document models (Longformers), under three optimization strategies tailored to clinically relevant metrics: area under the curve (AUC), F1-score, and sensitivity.

Results:

All models achieved strong performance (AUC > 0.940). The best-performing model, Longformer Biomedical-Clinical, reached an AUC of 0.987 and F1-score of 0.985. Sensitivity-optimized models achieved false negative rates below 3%, a critical threshold for clinical applicability.

Conclusions:

Transformer-based models can robustly detect HFpEF-related symptoms from unstructured Spanish clinical texts, even in the presence of class imbalance and complex language patterns. Our findings highlight the importance of combining domain-specific pretraining, long-context modeling, and tailored optimization to enhance the performance of NLP systems in high-impact clinical applications.

Citation

Please cite as:

Mata J, Pachón V, Manovel A, Maña MJ, de la Villa M

Multicriteria Optimization of Language Models for Heart Failure With Preserved Ejection Fraction Symptom Detection in Spanish Electronic Health Records: Comparative Modeling Study

J Med Internet Res 2025;27:e76433

DOI: 10.2196/76433

PMID: 40674251

PMCID: 12288768

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Accepted for/Published in: Journal of Medical Internet Research

Date Submitted: Apr 23, 2025

Date Accepted: May 26, 2025

Multi-Criteria Optimization of Language Models for HFpEF Symptom Detection in Spanish Electronic Health Records

ABSTRACT

Citation

Copyright