Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR AI

Date Submitted: Mar 24, 2025
Date Accepted: Jul 9, 2025

The final, peer-reviewed published version of this preprint can be found here:

Effectiveness of the GPT-4o Model in Interpreting Electrocardiogram Images for Cardiac Diagnostics: Diagnostic Accuracy Study

Engelstein H, Ramon Gonen R, Sabbag A, Klang E, Sudri K, Cohen-Shelly M, Barbash I

Effectiveness of the GPT-4o Model in Interpreting Electrocardiogram Images for Cardiac Diagnostics: Diagnostic Accuracy Study

JMIR AI 2025;4:e74426

DOI: 10.2196/74426

PMID: 40845836

PMCID: 12375907

GPT-4o's Effectiveness in ECG Image Interpretation for Cardiac Diagnostics: Evaluation Study

  • Haya Engelstein; 
  • Roni Ramon Gonen; 
  • Avi Sabbag; 
  • Eyal Klang; 
  • Karin Sudri; 
  • Michal Cohen-Shelly; 
  • Israel Barbash

ABSTRACT

Background:

Recent progress has demonstrated the potential of deep learning models in analyzing ECG pathologies. However, this method is intricate, expensive to develop, and designed for specific purposes. Large language models show promise in medical image interpretation, yet their effectiveness in ECG analysis remains understudied. GPT-4o, a multimodal AI model, capable of processing images and text without task-specific training, may offer an accessible alternative.

Objective:

This study evaluates GPT-4o's effectiveness in interpreting 12-lead ECGs, assessing classification accuracy, and exploring methods to enhance its performance.

Methods:

Six common ECG diagnoses were evaluated: Normal ECG, STEMI, AF, RBBB, LBBB, and paced rhythm, with 30 Normal ECGs and 10 of each abnormal pattern, totaling 80 cases (n=80). De-identified ECGs were analyzed using OpenAI’s GPT-4o. Our study employed both zero-shot and few-shot learning methodologies to investigate three main scenarios: (1) ECG image recognition, (2) binary classification of normal versus abnormal ECGs, and (3) multiclass classification into six categories.

Results:

The model excelled in recognizing ECG images, achieving an accuracy of 100%. In the classification of normal/abnormal ECG cases, the Few-Shot learning approach improved GPT-4o’s accuracy by 27%, reaching 80%. However, multiclass classification for a specific pathology remained limited, achieving only 41% accuracy.

Conclusions:

GPT-4o effectively differentiates normal from abnormal ECGs, suggesting its potential as an accessible AI-assisted triage tool. Although limited in diagnosing specific cardiac conditions, GPT-4o’s capability to interpret ECG images without specialized training highlights its potential for preliminary ECG interpretation in clinical and remote settings.


 Citation

Please cite as:

Engelstein H, Ramon Gonen R, Sabbag A, Klang E, Sudri K, Cohen-Shelly M, Barbash I

Effectiveness of the GPT-4o Model in Interpreting Electrocardiogram Images for Cardiac Diagnostics: Diagnostic Accuracy Study

JMIR AI 2025;4:e74426

DOI: 10.2196/74426

PMID: 40845836

PMCID: 12375907

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.