Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Currently submitted to: JMIR Medical Informatics

Date Submitted: Jun 4, 2026
Open Peer Review Period: Jun 17, 2026 - Aug 12, 2026
(currently open for review)

Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.

Answering the Hard Questions in GIST: A Systematic Review of Artificial Intelligence Applications in Diagnosis, Risk Stratification, Prognosis, and Metastatic Prediction.

  • Song Zheng; 
  • Ropafadzo Tsepang Phebeni; 
  • Oscar Onayi Mandizadza; 
  • Kaibo Guo

ABSTRACT

Background:

In the treatment of Gastrointestinal Stromal Tumors (GISTs) patients, clinicians face significant challenges, particularly in accurate diagnosis, preoperative risk stratification, differentiation from other subepithelial lesions (SELs), and precise prediction of recurrence and metastasis. Traditional diagnostic and prognostic methods often face limitations such as subjectivity, variability, and insufficient accuracy, leading to diagnostic ambiguities, potential over treatment, and sub-optimal patient outcomes.Addressing these gaps is essential to improving clinical decision-making and advancing precision oncology in GIST. Thus, the growing interest in data-driven approaches such as artificial intelligence to enhance diagnostic accuracy, risk stratification, and outcome prediction

Objective:

Therefore, this review aims to systematically evaluate the current applications of AI, ML, and DL in GIST, with the main aim to compare the most commonly used algorithms, data modalities, prediction tasks, validation approaches, and reported performance across studies. Additionally, our goal is to identify gaps in clinical applicability, external validation and interpretability. Furthermore, we aim to identify key gaps and clinical applicability that may hinder the translation of these technologies into routine clinical practice.

Methods:

This systematic review identified 65 original research studies developing Artificial Intelligence or Machine Learning or Deep Learning models for GIST clinical applications, published between 2011 and 2026. Deep learning architectures, particularly Convolutional Neural Networks (ResNet, EfficientNet, Vision Transformers), were most common for image analysis, while traditional machine learning (Random Forest, SVM, XGBoost) dominated radiomics-based approaches

Results:

Diagnostic models achieved the highest performance, with EUS-based approaches reaching 86-96% accuracy, while risk stratification models showed more variable results, particularly for intermediate-risk categories (AUCs 0.64-0.78). Prognostic models demonstrated C-indices of 0.72-0.86, and metastasis prediction models achieved AUCs of 0.87-0.96. External validation was conducted in only 29 of 66 studies, with consistent performance degradation compared to internal validation. Most studies were conducted in Chinese populations (n=46), with limited geographic and ethnic diversity. Single-center studies (n=37) predominated over multi-center collaborations (n=23).

Conclusions:

While AI models demonstrate technical feasibility and promising performance in controlled settings. The evidence base lacks the validation rigor, prospective evaluation, population diversity, and clinical integration necessary for confident clinical deployment in GIST care. Major limitations included selection bias from retrospective designs (62 of 65 studies), technical heterogeneity in imaging protocols affecting reproducibility, and class imbalance particularly affecting intermediate-risk predictions. There was a lack of prospective validation (only 1 of 65 studies), limited use interpretability methods (SHAP/LIME used in only 16 studies), limited assessment of clinical utility beyond accuracy metrics such as sensitivity and specificity , and absence of real-world implementation data. All 65 studies positioned their models as aids to clinical decision-making rather than replacements for physician judgment. Clinical Trial: n/a


 Citation

Please cite as:

Zheng S, Phebeni RT, Mandizadza OO, Guo K

Answering the Hard Questions in GIST: A Systematic Review of Artificial Intelligence Applications in Diagnosis, Risk Stratification, Prognosis, and Metastatic Prediction.

JMIR Preprints. 04/06/2026:103547

DOI: 10.2196/preprints.103547

URL: https://preprints.jmir.org/preprint/103547

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.