Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR Medical Informatics

Date Submitted: Jun 2, 2025
Date Accepted: Jan 19, 2026

The final, peer-reviewed published version of this preprint can be found here:

Enhanced Prediction of Atrial Fibrillation in Patients With Ischemic Stroke Through Electronic Medical Records and Text Mining: Algorithm Development and Validation

Chen YW, Sung SF, Hu YH, Yang YH

Enhanced Prediction of Atrial Fibrillation in Patients With Ischemic Stroke Through Electronic Medical Records and Text Mining: Algorithm Development and Validation

JMIR Med Inform 2026;14:e78117

DOI: 10.2196/78117

PMID: 41805733

Enhanced Prediction of Atrial Fibrillation in Ischemic Stroke Patients Through Electronic Medical Records and Text Mining: Algorithm Development and Validation

  • Yu-Wei Chen; 
  • Sheng-Feng Sung; 
  • Ya-Han Hu; 
  • Yu-Hsuan Yang

ABSTRACT

Background:

Stroke, a leading cause of death and disability worldwide, is exacerbated by often-undetected atrial fibrillation (AF), with no appropriate treatment for prevention.

Objective:

This study addresses the need for accurate, generalizable predictive models to identify high-risk individuals across healthcare settings. We focus on developing an AF risk model using electronic medical records (EMRs), integrating structured and unstructured data, and evaluating model generalizability through external validation and calibration.

Methods:

This study analyzed datasets from two hospitals: Landseed International Hospital (LIH) with 3,988 patients and Chia-Yi Christian Hospital (CYCH) with 5,821 patients. We applied five feature engineering techniques to extract features from unstructured EMR data, addressed data imbalance using six distinct resampling methods, and employed nine classification algorithms to compare model performance across both internal and external validation. Furthermore, the study identified the top 20 most important features from the best-performing models for both the LIH and CYCH datasets.

Results:

The optimal predictive model for LIH was based solely on structured data, whereas CYCH achieved superior results by integrating both structured and unstructured data (processed using TF-IDF). Notably, feature importance analysis consistently identified ratio of E- to A-wave velocities(E/A), left atrium (LA), and age as the top three predictive factors across both datasets, underscoring their critical role in AF risk assessment among stroke patients.

Conclusions:

This study demonstrates the development of a predictive model for AF in stroke patients. Notably, the integration of unstructured data substantially improves the model’s predictive accuracy. Rigorous internal and external validation processes confirm the superior performance of ensemble learning-based machine learning models compared to alternative algorithms, underscoring the efficacy of this approach in AF risk prediction.


 Citation

Please cite as:

Chen YW, Sung SF, Hu YH, Yang YH

Enhanced Prediction of Atrial Fibrillation in Patients With Ischemic Stroke Through Electronic Medical Records and Text Mining: Algorithm Development and Validation

JMIR Med Inform 2026;14:e78117

DOI: 10.2196/78117

PMID: 41805733

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.