JMIR Preprints #87062: Multimodal GPT-5 for Predicting Poor Functional Outcomes After Intracerebral Hemorrhage in the Emergency Department: Validation Study

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

Multimodal GPT-5 for Predicting Poor Functional Outcomes After Intracerebral Hemorrhage in the Emergency Department: Validation Study

Koutarou Matsumoto;
Kazuaki Ishihara;
Ryota Tamba;
Yusuke Fujiyoshi;
Koki Tokunaga;
Katsuhiko Matsuda;
Yasunobu Nohara;
Jenhui Chen;
Shigeo Yamashiro;
Naoki Nakashima;
Masahiro Kamouchi

ABSTRACT

Background:

In the emergency department (ED), rapid prognostic assessment of patients with intracerebral hemorrhage (ICH) is essential for guiding treatment, even when stroke specialists are unavailable. Recent advances in large language models have triggered the increased application of machine learning (ML) models in medical contexts.

Objective:

To evaluate the predictive performance of GPT-based models for poor functional outcomes after ICH using real-world multimodal data routinely available at ED presentation.

Methods:

The data of patients with ICH admitted to a tertiary hospital were analyzed. Using routinely collected clinical data and noncontrast computed tomography (CT) images at admission, GPT-4.1 and GPT-5—accessed via Azure OpenAI Service—were applied to predict poor functional outcomes, defined as a modified Rankin Scale score of 3–6 at discharge. A conventional ML model was developed by combining deep learning-extracted features from Digital Imaging and Communications in Medicine CT data with clinical variables using L1-regularized logistic regression. GPT models were evaluated using the same clinical dataset and JPEG-format CT images. Model performance was assessed through discrimination (area under the receiver operating characteristic curve [AUROC]), calibration, reproducibility (intraclass correlation coefficient [ICC]), and clinical utility (decision curve analysis [DCA]).

Results:

The ML model achieved an AUROC of 0.85 (95% confidence interval, 0.79–0.90). Zero-shot GPT-4.1 and GPT-5 demonstrated strong discrimination (AUROC 0.83 and 0.86, respectively) with high reproducibility (ICC 0.91 and 0.95, respectively). Incorporating ML-derived information into model-informed prompts increased the AUROC to 0.85 and 0.87, respectively, with reproducibility remaining high (ICC 0.97 and 0.96, respectively). Calibration plots indicated that GPT models tended to underestimate probabilities; however, this bias improved after model-informed prompting. DCA showed a higher net benefit when ML-derived information was incorporated.

Conclusions:

Zero-shot GPT models, particularly GPT-5, achieved predictive performance comparable to or exceeding that of conventional ML models using routinely available clinical data and CT images. Incorporating ML-derived outputs into GPT prompts further improved clinical utility, suggesting potential value for real-time decision support in emergency care.

Citation

Please cite as:

Matsumoto K, Ishihara K, Tamba R, Fujiyoshi Y, Tokunaga K, Matsuda K, Nohara Y, Chen J, Yamashiro S, Nakashima N, Kamouchi M

Multimodal GPT-5 for Predicting Poor Functional Outcomes After Intracerebral Hemorrhage in the Emergency Department: Validation Study

JMIR AI 2026;5:e87062

DOI: 10.2196/87062

PMID: 42202259

PMCID: 13216710

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Accepted for/Published in: JMIR AI

Date Submitted: Nov 3, 2025

Date Accepted: Mar 28, 2026

Multimodal GPT-5 for Predicting Poor Functional Outcomes After Intracerebral Hemorrhage in the Emergency Department: Validation Study

ABSTRACT

Citation

Copyright