Accepted for/Published in: JMIR Medical Informatics
Date Submitted: Nov 7, 2025
Open Peer Review Period: Nov 25, 2025 - Jan 20, 2026
Date Accepted: Apr 23, 2026
(closed for review but you can still tweet)
Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.
ECG-R1: A Multi-modal Vision-Language Model with Reinforcement Learning for Differentiating Ischemic from Non-ischemic T-wave Inversion
ABSTRACT
Background:
The differentiation of ischemic from non-ischemic T-wave inversion (TWI) on electrocardiograms (ECGs) is a critical diagnostic challenge in cardiology. The non-specific nature of TWI leads to high false-positive rates, resulting in unnecessary, costly, and risky invasive procedures for patients. Existing deep learning models are often limited by being single-modality "black boxes".
Objective:
The objective of this study is to develop a novel diagnostic framework designed to address the critical clinical challenge of accurately differentiating ischemic from non-ischemic TWI. By utilizing a multi-modal Vision-Language Model trained with a Reinforcement Learning (RL) paradigm, this study aims to improve diagnostic accuracy and provide interpretable reasoning.
Methods:
We develop ECG-R1, a multi-modal framework using the Qwen2-VL-2B Vision-Language Model to analyze both ECG waveform images and associated clinical text. Instead of SFT, the model is trained using a RL paradigm with the Group Relative Policy Optimization (GRPO) algorithm. The model is trained to generate a structured output containing an explicit reasoning trace and a final "Yes" or "No" answer. A two-component, rule-based reward function is designed to assess both format adherence and diagnostic accuracy. Performance is compared against strong Supervised Fine-Tuning (SFT) baselines.
Results:
On a multi-modal dataset of 12,917 cases with TWI, our GRPO model achieves an average accuracy of 74.07%, demonstrating strong generalization with 72.93% accuracy in cross-hospital validation. This result is an improvement of ~24 % over the ~50% diagnostic accuracy of clinicians and 8.2% higher than the best SFT baseline, using ~71% fewer parameters.
Conclusions:
The RL-based ECG-R1 framework successfully differentiates ischemic from non-ischemic TWI and demonstrates significantly better generalization than standard SFT methods. By enhancing diagnostic accuracy and providing interpretable reasoning, this approach offers a more robust and trustworthy tool to support clinical decision-making in cardiology.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.