Accepted for/Published in: Journal of Medical Internet Research
Date Submitted: Feb 14, 2022
Date Accepted: Dec 18, 2022
A Machine Learning Approach to Support Urgent Stroke Triage Using Administrative Data and Social Determinants of Health at Hospital Presentation: Retrospective Study
ABSTRACT
Background:
The key to effective stroke management is timely diagnosis and triage. Machine learning methods developed to assist in detecting stroke have focused on interpreting detailed clinical data such as clinical notes and diagnostic imaging results. However, such information may not be readily available when the patients are initially triaged, particularly in rural and underserved communities.
Objective:
This study aimed to develop a highly sensitive machine learning stroke prediction algorithm based on data widely available at patients’ hospital presentations and to assess the added value of social determinants of health (SDoH) in stroke prediction.
Methods:
We conducted a retrospective study of the ED and hospitalization records from all the acute care hospitals in the state of Florida from 2012 to 2014, matched with the social determinants of health data from the American Community Survey. A case-control design was adopted to construct the stroke and stroke-mimics cohorts. We compared the algorithm performance and feature importance measures of the machine learning models (i.e., Gradient Boosting Machine and Random Forest) to the logistic regression based on 3 sets of predictors. To provide insights into the prediction and ultimately assist care providers in decision making, we used TreeSHAP for tree-based machine learning models to explain the stroke prediction for each patient.
Results:
Our analysis included 143,203 hospital visits, of which 73% were confirmed to be stroke cases based on the principal diagnosis at discharge. The approach proposed in this paper has high sensitivity and is particularly effective in reducing the misdiagnosis of dangerous stroke chameleons (false-negative rate less than 4%). Machine learning classifiers consistently outperformed the benchmark logistic regression in all 3 input combinations. We found significant consistency across models regarding the features that explain the performance. The most important features are age, the number of chronic conditions on admission, and primary payer (e.g., Medicare or private insurance). While both the individual- and community-level SDoH features helped improve the predictive performance, the inclusion of the individual-level SDoH led to a much larger improvement (AUC from 0.694 to 0.823) compared to the improvement from the inclusion of the community-level SDoH (AUC from 0.823 to 0.829).
Conclusions:
Using data widely available at patients’ hospital presentations, we developed a stroke prediction model with high sensitivity and reasonable specificity. The prediction algorithm uses variables that are routinely collected by providers and payers and can be particularly useful in under-resourced hospitals with limited availability of sensitive diagnostic tools or incomplete data gathering capabilities. The algorithm can also be integrated with other AI-enabled prediction models in the ED or decision support systems based on electronic health records.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.