Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: Journal of Medical Internet Research

Date Submitted: Apr 23, 2025
Date Accepted: May 26, 2025

The final, peer-reviewed published version of this preprint can be found here:

Multicriteria Optimization of Language Models for Heart Failure With Preserved Ejection Fraction Symptom Detection in Spanish Electronic Health Records: Comparative Modeling Study

Mata J, Pachón V, Manovel A, Maña MJ, de la Villa M

Multicriteria Optimization of Language Models for Heart Failure With Preserved Ejection Fraction Symptom Detection in Spanish Electronic Health Records: Comparative Modeling Study

J Med Internet Res 2025;27:e76433

DOI: 10.2196/76433

PMID: 40674251

PMCID: 12288768

Multi-criteria Optimization of Language Models for HFpEF Symptom Detection in Spanish Electronic Health Records: Comparative Modeling Study

  • Jacinto Mata; 
  • Victoria Pachón; 
  • Ana Manovel; 
  • Manuel J. Maña; 
  • Manuel de la Villa

ABSTRACT

Background:

Heart failure with preserved ejection fraction (HFpEF) is a major clinical manifestation of cardiac amyloidosis (CA), a condition frequently underdiagnosed due to its nonspecific symptomatology. Electronic health records (EHRs) offer a promising avenue for supporting early symptom detection through natural language processing (NLP). However, identifying relevant clinical cues within unstructured narratives, particularly in Spanish, remains a significant challenge owing to the scarcity of annotated corpora and domain-specific models.

Objective:

This study proposes and evaluates a Transformer-based NLP framework for the automated detection of HFpEF-related symptoms in Spanish EHRs. Our objective is to assess the feasibility of leveraging clinical narratives to support early identification of heart failure phenotypes suggestive of cardiac amyloidosis.

Methods:

A novel corpus was developed from over 15,000 Spanish clinical documents, manually annotated and validated by cardiology experts. Several Transformer architectures were benchmarked, including general-purpose, biomedical-specialized, and long-document models (Longformers), under three optimization strategies tailored to clinically relevant metrics: area under the curve (AUC), F1-score, and sensitivity.

Results:

All models achieved strong performance (AUC > 0.940). The best-performing model, Longformer Biomedical-Clinical, reached an AUC of 0.987 and F1-score of 0.985. Sensitivity-optimized models achieved false negative rates below 3%, a critical threshold for clinical applicability.

Conclusions:

Transformer-based models can robustly detect HFpEF-related symptoms from unstructured Spanish clinical texts, even in the presence of class imbalance and complex language patterns. Our findings highlight the importance of combining domain-specific pretraining, long-context modeling, and tailored optimization to enhance the performance of NLP systems in high-impact clinical applications.


 Citation

Please cite as:

Mata J, Pachón V, Manovel A, Maña MJ, de la Villa M

Multicriteria Optimization of Language Models for Heart Failure With Preserved Ejection Fraction Symptom Detection in Spanish Electronic Health Records: Comparative Modeling Study

J Med Internet Res 2025;27:e76433

DOI: 10.2196/76433

PMID: 40674251

PMCID: 12288768

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.