Accepted for/Published in: Journal of Medical Internet Research
Date Submitted: Nov 18, 2023
Date Accepted: Nov 11, 2024
Machine learning and deep learning for diagnosis of lumbar spinal stenosis: A systematic review and meta-analysis
ABSTRACT
Background:
Lumbar spinal stenosis (LSS) is a major cause of pain and disability in elder individuals worldwide. Although increasing studies of traditional machine learning (TML) and deep learning (DL) were conducted in the field of diagnosing LSS and gained prominent results, the performance of these models has not been analyzed systematically.
Objective:
This systematic review and meta-analysis aimed to pool the results and evaluate the heterogeneity of the current studies in using TML or DL models to diagnose LSS, thereby providing more comprehensive information for further clinical application.
Methods:
This review was performed under the Preferred Reporting Items for Systematic Reviews and Meta-Analyses guidelines using articles extracted from PubMed, EMBASE databases, and Cochrane Library databases. Studies that evaluated DL or TML algorithms assessment value of on diagnosing LSS were included, while those with duplicated or unavailable data were excluded. Quality Assessment of Diagnostic Accuracy Studies 2 was used to estimate the risk of bias in each study. The MIDAS module and the METAPROP module of STATA were used for data synthesis and statistical analyses.
Results:
A total of 8 studies with 8186 patients reported the assessment value of TML or DL models for diagnosing LSS. The risk of bias assessment yielded 3, 1, and 4 studies with a high, unclear, and completely low risk of bias, respectively. The pooled sensitivity and specificity were 0.80 (95% confidence interval [CI]: 0.76–0.84; I2 = 97.95%) and 0.91 (95% CI: 0.87-0.94; I2 = 97.43%), respectively. The diagnostic odds ratio, the positive likelihood ratio (LR+), and the negative likelihood ratio (LR-) were 42 (95% CI: 26–68), 9.0 (95% CI: 6.3–12.8), and 0.21 (95% CI: 0.17–0.27), respectively. The summary receiver operating characteristic curves, the area under the curve of TML or DL models for diagnosing LSS of 0.92 (95% CI: 0.89–0.94), indicating a high diagnostic value.
Conclusions:
This systematic review and meta-analysis emphasize that further efforts should be made in both development and validation in future studies to bridge the distance between current TML or DL models and real-life clinical applications despite the generally satisfactory diagnostic performance of artificial intelligence systems in the field of LSS. Optimization of model balance, widely-accepted objective reference standards, multimodal strategy, large dataset for training and testing, external validation, and sufficient and scientific report will help for further clinical availability.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.