JMIR Preprints #86630: Evaluation of GPT-5 for Esophageal Cancer Staging Using Fluorodeoxyglucose Positron Emission Tomography Maximum-Intensity Projection Images: A Comparative Pilot Study

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

Evaluation of GPT-5 for Esophageal Cancer Staging Using Fluorodeoxyglucose Positron Emission Tomography Maximum-Intensity Projection Images: A Comparative Pilot Study

Hiroki Maruyama;
Yoshitaka Toyama;
Yuya Araki;
Kentaro Takanami;
Masato Ito;
Yumi Nakajima;
Kei Takase;
Takashi Kamei

ABSTRACT

Background:

Accurate esophageal cancer staging relies on fluorodeoxyglucose positron emission tomography (FDG-PET), but its interpretation is complex and time-intensive. This diagnostic burden is exacerbated by significant workforce shortages in both radiology and surgery, creating a need for automated support systems. The emergence of advanced large language models (LLMs) has raised expectations for their potential to fulfill this role in complex medical tasks.

Objective:

We evaluated the diagnostic accuracy of LLMs for staging esophageal cancer using fluorodeoxyglucose positron emission tomography (FDG-PET) images, with a focus on their ability to assess lymph nodes (LNs; cN) and distant metastases (cM) for automated radiology reporting.

Methods:

This retrospective study included 120 consecutive adult patients who were diagnosed with esophageal squamous cell carcinoma (SCC) and underwent FDG-PET/computed tomography at Tohoku University Hospital between January 2019 and December 2021. Patients with prior treatment, non-SCC histology, or blood glucose levels ≥ 200 mg/dL were excluded. Frontal maximum-intensity projection PET images were extracted, standardized, and analyzed along with information regarding the tumor location. Six LLMs (ChatGPT-5, ChatGPT-4.5, ChatGPT-4.1, OpenAI o3, o1, and ChatGPT-4 Turbo) and four blinded human evaluators (a nuclear medicine specialist, a gastrointestinal surgeon, and two radiology residents) assessed the presence of thoracic and abdominal LN metastases and determined cN and cM staging. The model analyses were performed using the application programming interface in a zero-shot setting. Diagnostic agreement and accuracy were evaluated using Cohen’s kappa, Cochran’s Q test, and post-hoc McNemar tests with Holm–Bonferroni correction; significance was set at < 0.05.

Results:

The average accuracy was 34–78% for LLMs and 60–85% for physicians, with significantly higher accuracy for physicians in the thoracic LN, abdominal LN, and cN stages. Among the LLMs, GPT-5 demonstrated the highest overall accuracy, with newer LLMs approaching physician-level performance in identifying abdominal LN metastases and cM staging, though they showed weaker consistency for cN staging.

Conclusions:

Although current LLMs have not yet reached physician-level accuracy in comprehensive staging, recent models show promise in assisting with specific diagnostic tasks.

Citation

Please cite as:

Maruyama H, Toyama Y, Araki Y, Takanami K, Ito M, Nakajima Y, Takase K, Kamei T

Evaluation of GPT-5 for Esophageal Cancer Staging Using Fluorodeoxyglucose Positron Emission Tomography Maximum-Intensity Projection Images: Comparative Pilot Study

JMIR Cancer 2026;12:e86630

DOI: 10.2196/86630

PMID: 41729569

PMCID: 12972682

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Accepted for/Published in: JMIR Cancer

Date Submitted: Oct 27, 2025

Date Accepted: Jan 30, 2026

Evaluation of GPT-5 for Esophageal Cancer Staging Using Fluorodeoxyglucose Positron Emission Tomography Maximum-Intensity Projection Images: A Comparative Pilot Study

ABSTRACT

Citation

Copyright