JMIR Preprints #87227: ECG-R1: A Multi-modal Vision-Language Model with Reinforcement Learning for Differentiating Ischemic from Non-ischemic T-wave Inversion

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

ECG-R1: A Multi-modal Vision-Language Model with Reinforcement Learning for Differentiating Ischemic from Non-ischemic T-wave Inversion

Yunzhang Cheng;
Zhongkai Wang;
Wen Zhang;
Qin Zhang;
Mingwei Zhang;
Songbin Cai;
Tianyi Zhang

ABSTRACT

Background:

The differentiation of ischemic from non-ischemic T-wave inversion (TWI) on electrocardiograms (ECGs) is a critical diagnostic challenge in cardiology. The non-specific nature of TWI leads to high false-positive rates, resulting in unnecessary, costly, and risky invasive procedures for patients. Existing deep learning models are often limited by being single-modality "black boxes".

Objective:

The objective of this study is to develop a novel diagnostic framework designed to address the critical clinical challenge of accurately differentiating ischemic from non-ischemic TWI. By utilizing a multi-modal Vision-Language Model trained with a Reinforcement Learning (RL) paradigm, this study aims to improve diagnostic accuracy and provide interpretable reasoning.

Methods:

We develop ECG-R1, a multi-modal framework using the Qwen2-VL-2B Vision-Language Model to analyze both ECG waveform images and associated clinical text. Instead of SFT, the model is trained using a RL paradigm with the Group Relative Policy Optimization (GRPO) algorithm. The model is trained to generate a structured output containing an explicit reasoning trace and a final "Yes" or "No" answer. A two-component, rule-based reward function is designed to assess both format adherence and diagnostic accuracy. Performance is compared against strong Supervised Fine-Tuning (SFT) baselines.

Results:

On a multi-modal dataset of 12,917 cases with TWI, our GRPO model achieves an average accuracy of 74.07%, demonstrating strong generalization with 72.93% accuracy in cross-hospital validation. This result is an improvement of ~24 % over the ~50% diagnostic accuracy of clinicians and 8.2% higher than the best SFT baseline, using ~71% fewer parameters.

Conclusions:

The RL-based ECG-R1 framework successfully differentiates ischemic from non-ischemic TWI and demonstrates significantly better generalization than standard SFT methods. By enhancing diagnostic accuracy and providing interpretable reasoning, this approach offers a more robust and trustworthy tool to support clinical decision-making in cardiology.

Citation

Please cite as:

Cheng Y, Wang Z, Zhang W, Zhang Q, Zhang M, Cai S, Zhang T

Differentiating Ischemic From Nonischemic T-Wave Inversion Using a Multimodal Vision-Language Model With Reinforcement Learning (ECG-R1): Development and Validation Study

JMIR Med Inform 2026;14:e87227

DOI: 10.2196/87227

PMID: 42319812

PMCID: 13281817

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Accepted for/Published in: JMIR Medical Informatics

Date Submitted: Nov 7, 2025

Open Peer Review Period: Nov 25, 2025 - Jan 20, 2026

Date Accepted: Apr 23, 2026

(closed for review but you can still tweet)

ECG-R1: A Multi-modal Vision-Language Model with Reinforcement Learning for Differentiating Ischemic from Non-ischemic T-wave Inversion

ABSTRACT

Citation

Copyright