Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR AI

Date Submitted: Nov 3, 2025
Date Accepted: Mar 28, 2026

The final, peer-reviewed published version of this preprint can be found here:

Multimodal GPT-5 for Predicting Poor Functional Outcomes After Intracerebral Hemorrhage in the Emergency Department: Validation Study

Matsumoto K, Ishihara K, Tamba R, Fujiyoshi Y, Tokunaga K, Matsuda K, Nohara Y, Chen J, Yamashiro S, Nakashima N, Kamouchi M

Multimodal GPT-5 for Predicting Poor Functional Outcomes After Intracerebral Hemorrhage in the Emergency Department: Validation Study

JMIR AI 2026;5:e87062

DOI: 10.2196/87062

PMID: 42202259

Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.

Multimodal GPT-5 for Predicting Poor Functional Outcomes After Intracerebral Hemorrhage in the Emergency Department: Validation Study

  • Koutarou Matsumoto; 
  • Kazuaki Ishihara; 
  • Ryota Tamba; 
  • Yusuke Fujiyoshi; 
  • Koki Tokunaga; 
  • Katsuhiko Matsuda; 
  • Yasunobu Nohara; 
  • Jenhui Chen; 
  • Shigeo Yamashiro; 
  • Naoki Nakashima; 
  • Masahiro Kamouchi

ABSTRACT

Background:

In the emergency department (ED), rapid prognostic assessment of patients with intracerebral hemorrhage (ICH) is essential for guiding treatment, even when stroke specialists are unavailable. Recent advances in large language models have triggered the increased application of machine learning (ML) models in medical contexts.

Objective:

To evaluate the predictive performance of GPT-based models for poor functional outcomes after ICH using real-world multimodal data routinely available at ED presentation.

Methods:

The data of patients with ICH admitted to a tertiary hospital were analyzed. Using routinely collected clinical data and noncontrast computed tomography (CT) images at admission, GPT-4.1 and GPT-5—accessed via Azure OpenAI Service—were applied to predict poor functional outcomes, defined as a modified Rankin Scale score of 3–6 at discharge. A conventional ML model was developed by combining deep learning-extracted features from Digital Imaging and Communications in Medicine CT data with clinical variables using L1-regularized logistic regression. GPT models were evaluated using the same clinical dataset and JPEG-format CT images. Model performance was assessed through discrimination (area under the receiver operating characteristic curve [AUROC]), calibration, reproducibility (intraclass correlation coefficient [ICC]), and clinical utility (decision curve analysis [DCA]).

Results:

The ML model achieved an AUROC of 0.85 (95% confidence interval, 0.79–0.90). Zero-shot GPT-4.1 and GPT-5 demonstrated strong discrimination (AUROC 0.83 and 0.86, respectively) with high reproducibility (ICC 0.91 and 0.95, respectively). Incorporating ML-derived information into model-informed prompts increased the AUROC to 0.85 and 0.87, respectively, with reproducibility remaining high (ICC 0.97 and 0.96, respectively). Calibration plots indicated that GPT models tended to underestimate probabilities; however, this bias improved after model-informed prompting. DCA showed a higher net benefit when ML-derived information was incorporated.

Conclusions:

Zero-shot GPT models, particularly GPT-5, achieved predictive performance comparable to or exceeding that of conventional ML models using routinely available clinical data and CT images. Incorporating ML-derived outputs into GPT prompts further improved clinical utility, suggesting potential value for real-time decision support in emergency care.


 Citation

Please cite as:

Matsumoto K, Ishihara K, Tamba R, Fujiyoshi Y, Tokunaga K, Matsuda K, Nohara Y, Chen J, Yamashiro S, Nakashima N, Kamouchi M

Multimodal GPT-5 for Predicting Poor Functional Outcomes After Intracerebral Hemorrhage in the Emergency Department: Validation Study

JMIR AI 2026;5:e87062

DOI: 10.2196/87062

PMID: 42202259

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.