Accepted for/Published in: Journal of Medical Internet Research
Date Submitted: Oct 10, 2025
Date Accepted: Jan 29, 2026
Machine Learning for Predicting Stroke Risk Stratification using Multi-Omics Data: Systematic Review
ABSTRACT
Background:
Stroke is a complex, multidimensional disorder influenced by interacting inflammatory, immune, coagulation, endothelial, and metabolic pathways. Single-omics approaches seldom capture this complexity, whereas multi-omics techniques provide complementary insights but generate high-dimensional and correlated feature spaces. Machine learning (ML) offers strategies to manage these challenges; however, the predictive accuracy and reproducibility of multi-omics-based ML models for stroke remain poorly characterized.
Objective:
This review aimed to conduct a systematic evaluation and meta-analysis of ML models employing multi-omics data for stroke risk stratification, focusing on predictive performance, integration strategies, and validation practices.
Methods:
A comprehensive literature search was conducted following PRISMA 2020 recommendations. Studies published from January 2000 to July 2025 were identified across nine databases, including PubMed, MEDLINE Ultimate, EMBASE, CINAHL, Web of Science, Scopus, Cochrane CENTRAL, ACM Digital Library, and IEEE Xplore. Eligible studies included adults with ischemic, hemorrhagic, or unspecified stroke as the prediction target, applied at least 2 omics layers, and reported ML performance metrics. Risk of bias was assessed using the Prediction Model Risk of Bias Assessment Tool (PROBAST), while reporting quality was evaluated using CHARMS, MINimum Information for Medical AI Reporting (MINIMAR), and QUADOMICS. The primary outcome was the area under the receiver operating characteristic curve (AUC). Random-effects and multilevel meta-analyses were conducted, and heterogeneity was assessed using I².
Results:
Eight studies (n=45,274) published between 2021 and 2025 fulfilled the inclusion criteria. All studies applied 2 omics layers, most frequently using middle-level integration. Four studies reported external validation. Reported AUCs ranged from 0.748 to 0.973, with the highest externally validated performance achieved by a support vector machine trained on a metabolomics–proteomics dyad. The primary meta-analysis, restricted to studies rated as low risk by PROBAST, yielded a pooled AUC of 0.83 (95% CI 0.66–1.01; I²=95%). Secondary and exploratory analyses that included all eligible studies and estimates reported pooled AUCs between 0.87 and 0.90, with persistent heterogeneity. No significant moderating effects were detected for validation type, risk of bias, or sample size.
Conclusions:
Multi-omics ML models showed good-to-excellent accuracy for stroke risk stratification, particularly when integrating compact and biologically coherent biomarker panels. However, heterogeneity across analytic frameworks, limited reporting of calibration, and a lack of extensive external validation continue to hinder reproducibility and generalizability. To advance the field, future studies should adopt leakage-resistant evaluation frameworks, conduct site-specific external validations, and benchmark against both single-omics and clinical baselines to demonstrate incremental value. Well-designed, transparently reported investigations will be essential to move multi-omics ML models from exploratory promise toward clinically actionable tools in precision stroke care. Clinical Trial: PROSPERO (CRD420251089823); https://www.crd.york.ac.uk/PROSPERO/view/CRD420251089823.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.