JMIR Preprints #85654: Machine Learning for Predicting Stroke Risk Stratification using Multi-Omics Data: Systematic Review

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

Machine Learning for Predicting Stroke Risk Stratification using Multi-Omics Data: Systematic Review

Hae Young Yoo;
Hyerim Shin;
Eun-Jung Kim;
Youn-Jung Son

ABSTRACT

Background:

Stroke is a complex, multidimensional disorder influenced by interacting inflammatory, immune, coagulation, endothelial, and metabolic pathways. Single-omics approaches seldom capture this complexity, whereas multi-omics techniques provide complementary insights but generate high-dimensional and correlated feature spaces. Machine learning (ML) offers strategies to manage these challenges; however, the predictive accuracy and reproducibility of multi-omics-based ML models for stroke remain poorly characterized.

Objective:

This review aimed to conduct a systematic evaluation and meta-analysis of ML models employing multi-omics data for stroke risk stratification, focusing on predictive performance, integration strategies, and validation practices.

Methods:

A comprehensive literature search was conducted following PRISMA 2020 recommendations. Studies published from January 2000 to July 2025 were identified across nine databases, including PubMed, MEDLINE Ultimate, EMBASE, CINAHL, Web of Science, Scopus, Cochrane CENTRAL, ACM Digital Library, and IEEE Xplore. Eligible studies included adults with ischemic, hemorrhagic, or unspecified stroke as the prediction target, applied at least 2 omics layers, and reported ML performance metrics. Risk of bias was assessed using the Prediction Model Risk of Bias Assessment Tool (PROBAST), while reporting quality was evaluated using CHARMS, MINimum Information for Medical AI Reporting (MINIMAR), and QUADOMICS. The primary outcome was the area under the receiver operating characteristic curve (AUC). Random-effects and multilevel meta-analyses were conducted, and heterogeneity was assessed using I².

Results:

Eight studies (n=45,274) published between 2021 and 2025 fulfilled the inclusion criteria. All studies applied 2 omics layers, most frequently using middle-level integration. Four studies reported external validation. Reported AUCs ranged from 0.748 to 0.973, with the highest externally validated performance achieved by a support vector machine trained on a metabolomics–proteomics dyad. The primary meta-analysis, restricted to studies rated as low risk by PROBAST, yielded a pooled AUC of 0.83 (95% CI 0.66–1.01; I²=95%). Secondary and exploratory analyses that included all eligible studies and estimates reported pooled AUCs between 0.87 and 0.90, with persistent heterogeneity. No significant moderating effects were detected for validation type, risk of bias, or sample size.

Conclusions:

Multi-omics ML models showed good-to-excellent accuracy for stroke risk stratification, particularly when integrating compact and biologically coherent biomarker panels. However, heterogeneity across analytic frameworks, limited reporting of calibration, and a lack of extensive external validation continue to hinder reproducibility and generalizability. To advance the field, future studies should adopt leakage-resistant evaluation frameworks, conduct site-specific external validations, and benchmark against both single-omics and clinical baselines to demonstrate incremental value. Well-designed, transparently reported investigations will be essential to move multi-omics ML models from exploratory promise toward clinically actionable tools in precision stroke care. Clinical Trial: PROSPERO (CRD420251089823); https://www.crd.york.ac.uk/PROSPERO/view/CRD420251089823.

Citation

Please cite as:

Yoo HY, Shin H, Kim EJ, Son YJ

Machine Learning for Predicting Stroke Risk Stratification Using Multiomics Data: Systematic Review

J Med Internet Res 2026;28:e85654

DOI: 10.2196/85654

PMID: 41711384

PMCID: 12963974

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Accepted for/Published in: Journal of Medical Internet Research

Date Submitted: Oct 10, 2025

Date Accepted: Jan 29, 2026

Machine Learning for Predicting Stroke Risk Stratification using Multi-Omics Data: Systematic Review

ABSTRACT

Citation

Copyright