Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Currently submitted to: JMIR Medical Informatics

Date Submitted: Mar 15, 2026
Open Peer Review Period: Mar 26, 2026 - May 21, 2026
(currently open for review)

Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.

Development a Machine Learning Model Based on Laboratory Biomarkers and Mendelian Randomization for Differentiating Intracerebral Hemorrhage and Acute Ischemic Stroke

  • Xiaoyan Hao; 
  • Juan Wang; 
  • Jing Wang; 
  • Yujiao Hu; 
  • Fengjuan Wang; 
  • Xinran Liu; 
  • Lei Zhou; 
  • Congxia Bai

ABSTRACT

Background:

ICH (Intracerebral hemorrhage) and acute IS (ischemic stroke) are life-threatening cerebrovascular disorders that sometimes share similar clinical presentations but require fundamentally different treatment approaches.

Objective:

This study aims to develop an advanced ML (machine learning) model that integrates patient laboratory data to rapidly differentiate between ICH and IS.

Methods:

We retrospectively analyzed clinical and laboratory data from 12,213 hospitalized patients at Xijing Hospital between 2013 and 2023, including 3,251 ICH and 8,962 IS patients, and 2,893 hypertensive individuals as controls. An external validation cohort comprising 154 ICH and 342 IS patients admitted to Xijing Hospital from January to December 2024 was constructed. The dataset was balanced using the BS2 (BorderlineSMOTE-2) technique. Three feature selection methods (RFECV-ADA, Lasso, and Boruta) were used to identify potential biomarkers, and Spearman correlation was used to assess intermarker relationships. Six ML models were trained using ten cross-validations. Predictive models were developed using supervised ML algorithms. Model performance was evaluated on the basis of the AUC (area under the curve), sensitivity and specificity. Feature contributions were interpreted via SHAP (SHapley Additive exPlanations) plots. Furthermore, an interactive interface was implemented using PyQt5. Finally, we screened genetic instruments related to candidate indicators and paired them with ICH and IS genome-wide association study data to conduct Mendelian randomization analysis. Positive Mendelian randomization findings were then subjected to colocalization analysis.

Results:

Ten features were identified for model training: white blood cell count, NEUT% (neutrophil percentage), CysC (cystatin C) levels, UA (uric acid) levels, TP (total protein), K+ (potassium) levels, sodium levels, chloride levels, fibrinogen degradation product and D-Dimer levels. The BS2_LightGBM_V10 model demonstrated excellent performance in differentiating ICH and IS patients (AUC = 0.926), ICH patients and controls (AUC = 0.979), and IS patients and controls (AUC = 0.923) with the test cohort. In the three-class classification task (distinguishing ICH patients, IS patients, and controls), model accuracy with the test cohort reached 79.07%. SHAP plots revealed that NEUT%, D-Dimer, CysC, and TP were the most influential features for model predictions. Mendelian randomization analysis indicated that UA (OR, 1.309 [95% CI, 1.117–1.534]) has causal relationships with the risk of ICH onset, whereas UA (OR, 1.0008 [95% CI, 1.0001–1.0015]), CysC (OR, 1.0011 [95% CI, 1.0001–1.0020]) and K (OR, 1.0040 [95% CI, 1.0003–1.0076]) have causal relationships with the risk of IS onset. Colocalization analysis revealed 18 genes that are linked to UA in the context of ICH. Finally, potential gene-targeting drugs were screened.

Conclusions:

This study developed a diagnostic model that utilizes ten routine laboratory indicators to accurately differentiate between ICH and IS. Among the considered biomarkers, UA was identified as a causal risk factor for both disorders.


 Citation

Please cite as:

Hao X, Wang J, Wang J, Hu Y, Wang F, Liu X, Zhou L, Bai C

Development a Machine Learning Model Based on Laboratory Biomarkers and Mendelian Randomization for Differentiating Intracerebral Hemorrhage and Acute Ischemic Stroke

JMIR Preprints. 15/03/2026:95378

DOI: 10.2196/preprints.95378

URL: https://preprints.jmir.org/preprint/95378

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.