Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: Journal of Medical Internet Research

Date Submitted: Feb 14, 2022
Date Accepted: Dec 18, 2022

The final, peer-reviewed published version of this preprint can be found here:

A Machine Learning Approach to Support Urgent Stroke Triage Using Administrative Data and Social Determinants of Health at Hospital Presentation: Retrospective Study

Chen M, Tan X, Padman R

A Machine Learning Approach to Support Urgent Stroke Triage Using Administrative Data and Social Determinants of Health at Hospital Presentation: Retrospective Study

J Med Internet Res 2023;25:e36477

DOI: 10.2196/36477

PMID: 36716097

PMCID: 9926350

A Machine Learning Approach to Support Urgent Stroke Triage Using Administrative Data and Social Determinants of Health at Hospital Presentation: Retrospective Study

  • Min Chen; 
  • Xuan Tan; 
  • Rema Padman

ABSTRACT

Background:

The key to effective stroke management is timely diagnosis and triage. Machine learning methods developed to assist in detecting stroke have focused on interpreting detailed clinical data such as clinical notes and diagnostic imaging results. However, such information may not be readily available when the patients are initially triaged, particularly in rural and underserved communities.

Objective:

This study aimed to develop a highly sensitive machine learning stroke prediction algorithm based on data widely available at patients’ hospital presentations and to assess the added value of social determinants of health (SDoH) in stroke prediction.

Methods:

We conducted a retrospective study of the ED and hospitalization records from all the acute care hospitals in the state of Florida from 2012 to 2014, matched with the social determinants of health data from the American Community Survey. A case-control design was adopted to construct the stroke and stroke-mimics cohorts. We compared the algorithm performance and feature importance measures of the machine learning models (i.e., Gradient Boosting Machine and Random Forest) to the logistic regression based on 3 sets of predictors. To provide insights into the prediction and ultimately assist care providers in decision making, we used TreeSHAP for tree-based machine learning models to explain the stroke prediction for each patient.

Results:

Our analysis included 143,203 hospital visits, of which 73% were confirmed to be stroke cases based on the principal diagnosis at discharge. The approach proposed in this paper has high sensitivity and is particularly effective in reducing the misdiagnosis of dangerous stroke chameleons (false-negative rate less than 4%). Machine learning classifiers consistently outperformed the benchmark logistic regression in all 3 input combinations. We found significant consistency across models regarding the features that explain the performance. The most important features are age, the number of chronic conditions on admission, and primary payer (e.g., Medicare or private insurance). While both the individual- and community-level SDoH features helped improve the predictive performance, the inclusion of the individual-level SDoH led to a much larger improvement (AUC from 0.694 to 0.823) compared to the improvement from the inclusion of the community-level SDoH (AUC from 0.823 to 0.829).

Conclusions:

Using data widely available at patients’ hospital presentations, we developed a stroke prediction model with high sensitivity and reasonable specificity. The prediction algorithm uses variables that are routinely collected by providers and payers and can be particularly useful in under-resourced hospitals with limited availability of sensitive diagnostic tools or incomplete data gathering capabilities. The algorithm can also be integrated with other AI-enabled prediction models in the ED or decision support systems based on electronic health records.


 Citation

Please cite as:

Chen M, Tan X, Padman R

A Machine Learning Approach to Support Urgent Stroke Triage Using Administrative Data and Social Determinants of Health at Hospital Presentation: Retrospective Study

J Med Internet Res 2023;25:e36477

DOI: 10.2196/36477

PMID: 36716097

PMCID: 9926350

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.